The 64-bit era is about to bloom. The small square is  an IBM processor without carrier and pins. Building processors with copper,  a more-efficient conductor than the aluminum used on most others, has  been one recent manufacturing feat for Big Blue.

The 64-bit era is about to bloom. The small square is an IBM processor without carrier and pins. Building processors with copper, a more-efficient conductor than the aluminum used on most others, has been one recent manufacturing feat for Big Blue.


It's not pretty, but Intel's 64-bit processor, cache,  and cartridge will influence for many years how software is written and  used. The processor may be released for commercial sale in the first quarter  of next year.

It's not pretty, but Intel's 64-bit processor, cache, and cartridge will influence for many years how software is written and used. The processor may be released for commercial sale in the first quarter of next year.


Intel offers this graph as a guide to future releases  of its processors. The company is also planning buses to compliment them.  For instance, Xeon chips have been teamed with Intel 840 chips to support  up to four processors, a 133-MHz front side bus, and Rambus memory capable  of 3.2 Gbytes/sec. Itaniums will get 460GX chipsets to support four processors  on a faster front-side bus, and error correction code (ECC) on memory  and data paths.

Intel offers this graph as a guide to future releases of its processors. The company is also planning buses to compliment them. For instance, Xeon chips have been teamed with Intel 840 chips to support up to four processors, a 133-MHz front side bus, and Rambus memory capable of 3.2 Gbytes/sec. Itaniums will get 460GX chipsets to support four processors on a faster front-side bus, and error correction code (ECC) on memory and data paths.


Modern processors are more than arithmetic units. The  Itanium involves a design called Epic for explicitly parallel instruction  computing. Speculation functions estimate what data and instructions might  be needed to make each clock cycle as productive as possible.

Modern processors are more than arithmetic units. The Itanium involves a design called Epic for explicitly parallel instruction computing. Speculation functions estimate what data and instructions might be needed to make each clock cycle as productive as possible.


Early next year, Intel is expected to release its first 64-bit processor, the product of a four-year collaboration between itself and Hewlett-Packard Co. Although 64-bit CPUs have come before, Intel's Itanium will eventually find its way onto many more engineering desktop computers than its predecessors. The capability delivered by this processor will ripple through engineering departments for the next 20 years.

Exactly how the new technology will change the way engineers work is not entirely clear from this early perspective, aside from erasing all distinctions between PCs and workstations. What is certain is that the Itanium will crunch through problems faster than earlier processors and handle larger models than 32-bit CPUs can bite into. Thus, assembly analysis could become the norm, not the exception as it is today, because databases can be much larger. Analyses will be more thorough, rendering will complete faster, and CAD model regenerations will occur almost in the blink of an eye. The powerful machines are likely to let several engineers use it at the same time, and thereby further bringing down the cost of high-speed computing.

THE CONTENDERS
Processors that work with 64-bits have come mostly from SGI, IBM, HP, Sun, and Compaq in their Alpha, making Intel and rival chip maker AMD late-comers to the 64-bit party. (Advanced Micro Devices is developing a 64-bit processor, Sledgehammer, but news has been scant.)

Processors preceding Itanium have had lesser market penetration than Intel's Pentiums because each, except for Alpha, ran a proprietary version of Unix. Consequently, FEA developers, for example, had to write and maintain programs to run on each Unix variation. For example, programs that run on IBM's Unix won't run on the others. And a company that might have gotten a good deal on computer hardware had to let it go if it didn't run their chosen operating system.

That ball and chain will soon be gone. Linux will be the OS on many of the first 64-bit servers while Windows 2000 for 64-bit computers will dominate desk-top and the remaining server units. Companies might then buy one copy of an engineering program and run its floating license on machines from several manufacturers.

For servers, new processors will work with larger data sets and large chunks of program. The coming CPU will allow what's called control intensive nonloopy integer code, irregular data-access patterns, and a larger number of users.

The Linux port to 64 bits, called the Trillian Project, is being spearheaded by VA Re-search Linux Systems, Mountain View, Calif., which says it will be a true 64-bit OS that can run 32-bit software as well. SGI, HP, and other companies are contributing technology to the effort. The kernel is available at www.kernel.org for software developers.

For big jobs, Intel says the Itanium will work with large data sets, compute-intense loopy floating-point code such as FEA, and even tolerate a small number of users or

clients. The processor uses an 84-bit sequence for floating-point operations to give it more precision and range than previous processors. This means calculations incur a smaller rounding error, iterative calculations converge faster, and it handles larger numbers than Risc processors without over-flow features. Technical applications are often loop intensive. IA-64 software pipelining optimizes loop structures. A pipeline is a processing method that lets the CPU begin the next instruction before finishing the previous one. And data prefetching grabs data before its needed to reduce the time the processor sits idle.

A BRIEF PROFILE
Itanium will operate initially at 800 MHz and be capable of performing 20 opera-tions/cycle. It will have 4-Mbyte high-speed, on-cartridge, level-three cache, and will use over 320 million transistors. The CPU will use 25 million of them and the remainder, 295 million will be on level-three cache. Furthermore, a 2.1-Gbyte/sec bus probably located on the cartridge, will improve communications with assisting chips.

Parallelism at the thread and instruction level of the compiler will increase throughput. Properly written programs will become a sequence of parallel-instruction groups.

Massive resources, such as memory, will also assist in speed ups. For instance, 32-bit systems could address databases limited to about 4 Gbytes (2 32 = 4.3 X 10 9 ) of different addresses. Intel engineers have provided extensions to increase this number. Certainly a 4 billion node FEA model would be large, but financial databases of this size are more common. A 64-bit system now has a theoretical addressable "limit" of 2 64 = 1.84 X 10 19 . The company says the processor will support 64 and 32-bit pointers, so existing or older software will run on the new chip.

Intel has been careful to brief technical software vendors on developments so that new programs will be available at the same time as new computers. "Introducing 64-bit processors will usher in a new age for engineering software," says Robert Williams, development manager with Algor Inc., Pittsburgh. "True 64-bit applications will fully utilize the ever-increasing speed of personal computers. Such applications will allow tremendous performance gains when displaying complex graphics or performing mathematical calculations required with scientific software such as finite-element analysis."

AN ALPHA MICROPROCESSOR ROADMAP
PROCESSOR NAME
EV67
EV68
EV7
EV8
Ship date
Q3 99
2000
Unannounced
unannounced
Technology
CMOS
0.28 µm
0.18 µm
0.18 µm
0.13 µm
Vdd (Volts)
2
1.5
1.5
1.2
Pins
587
587
1,439
about 1,800
CHARACTERISTICS
Frequency (MHz)
750
1,000
about 1,250
about 1,600
Specint95
45
60
75
about160
Specfp95
85
120
160
about 400
Memory bandwidth (Gbytes/sec)
2
2
10
16
Cache bandwidth (Gbytes/sec)
7
8
20
100
Intel is not the only company with processor plans. Here’s what Compaq has slated for its Alpha 64-bit processor. The company inherited the Alpha design when it purchased Digital Equipment Co. several years ago.

A brief glossary of recent computer terms
Computer technology is changing so fast that the hot box you bought just three years ago is woefully behind the times. The good news is that the concepts have stayed the same: computers still have processors, memory, and busses. Computer builders today have more variety to choose from so you might have a RDRAM instead of just RAM. Here are several other acronyms you might run into.

Front-side bus — the bus that connects the CPU with main memory. It transfers data between motherboard and other computer system components.

Epic — Explicitly parallel instruction computing, a design to speed internal parallel operations. Pipeline — a processing technique in which the processor begins executing a second instruction before finishing the first. Several instructions are handled simultaneously at different stages in the pipeline.

Prefetching — a technique that pulls instructions and data not immediately needed in an effort to keep the processor active.

Rambus — proprietary bus technology from Rambus Inc., (www.rambus.com) that uses multilevel signaling for data transfer rates of 1.6 Gbits/sec.

RDRAM — Rambus DRAM or dynamic random access memory.

SDRAM — Synchronous DRAM. Relatively recent DRAM that runs at higher clock speeds than conventional memory. It synchronizes itself to the bus.

Speculation — a software technique that estimates what instructions or data might be soon needed.


IBM processors sport copper "wiring" and SOI

The cross section of a processor shows the vias  or connections made with copper. IBM's complimentary development  called SOI or silicon on insulator, allows for more effectively  insulate pathways from each other and thereby build thinner processors,  or with more layers.

The cross section of a processor shows the vias or connections made with copper. IBM's complimentary development called SOI or silicon on insulator, allows for more effectively insulate pathways from each other and thereby build thinner processors, or with more layers.


Processor developers have not been resting on their laurels just because they have been computing in 64-bits for the last several years. IBM, for example, recently introduced processors with copper "wiring" or interconnects from transistor to transistor. Copper conduct signals better than aluminum. Until the introduction of copper in IBM microprocessors, chip makers used aluminum connections because it was the best conductor that worked with its chip-manufacturing methods. IBM engineers overcame the difficulty of using copper in a unique process.

Silicon on insulator (SOI), also from IBM, is simply an insulator between microprocessor layers. The two developments go hand in hand to generate greater transistor speed and higher processor throughput without.

"Each generation of photolithography has significantly increased transistor density," says Darryl Solie, distinguished engineer with IBM, Rochester, Minn. "But the speed of the transistors and associated wiring between them is not allowing a faster processing rate."

In fact, says Solie, a limiting factor in CMOS technology or processor design is now the distance between transistors. "Increasing throughput will take more than frequency. The speed at which a processor can run is becoming more limited by transistor-to-transistor wire lengths. Copper and SOI should help solve that problem, " he adds.

"We're approaching the practical limits of CMOS technology," says Solie. "There are still a couple processor generations to go before hitting a wall that invalidates Moore's Law." (That is the observation by Intel founder Gordon Moore that says transistor density, and hence speed, will double about every 18 months.) "I do think other innovations will take place that will keep Moore's law valid beyond today's CMOS technology."

Solie cautions about thinking that only the megahertz rating dictates processor speed. "Processor throughput is more a function of how much work gets done in a clock cycle. It's also a function of the number of pipeline stages in a processor." Processors in IBM's AS/400 and RS-6000-based business servers use a five-stage pipeline. So an instruction, such as move data or load register, completes in five cycles. Some processors have 12 to 15 pipeline stages. An instruction for them would take 12 to 15 cycles to get through. "This is doing less work per cycle but you can make it so that each cycle is fast."

Another challenge in processor design deals with getting data to the micro-processor. "It's not hard to see that if a computer has a 600-MHz processor but only a 100-MHz bus, the processor is often waiting for data and not as efficient as it might be," says Solie. "The ideal bus has the same speed as the processor, but it's often only a half, a third, or a fourth or less in some cases of the processor speed. It has been difficult to make faster busses."

One solution is to build a computer on a single piece of silicon, with all transistors close to each other. But if it has to go off chip for data, the electrical signal must be driven some distance, and that takes time. The problem gets worse when several processors share the same bus, a design called multidropping. "The electrical characteristics become slower and it just can't run fast point-to-point," says Solie.

IBM overcomes the multidrop problem with point-to-point buses. "The design takes more pins and more technology, but we can run the bus up to 500 MHz and beyond."


Developing Itanium-based computers

SGI's has been experimenting with an eight-processor  Itanium computer. The unit runs a 64-bit Linux OS and several engineering  programs compiled for 64 bits. Expect processing times to drop as  developers learn to tune the hardware and software for best performance.

SGI's has been experimenting with an eight-processor Itanium computer. The unit runs a 64-bit Linux OS and several engineering programs compiled for 64 bits. Expect processing times to drop as developers learn to tune the hardware and software for best performance.


Although the exact date of the Itanium's introduction is uncertain, the processor will not be a complete surprise to the computer community because Intel has been shipping prototypes or engineering samples to hardware and software vendors. For example, SGI, Mountain View, Calif., recently showed a prototype computer running the processor and using Linux as the OS.

The company has demonstrated the machine running an IA-64 version of Fluent, the CFD software from Fluent Inc., Lebanon, NH. "Other engineering codes such as Ansys and LS-Dyna, are being compiled and tested on IA-64," says Mark Kremenetsky, principal scientist with SGI.

The SGI Origin 3000 will accommodate either the company's own Mips processors and Irix (the SGI version of Unix) and in the near future, IA-64 processors and Linux. "This way there is no system architectural change from one to the other. In addition, parallel scalability looks encouraging which reflects the good design of both architecture and application."

The computer has a local bus only, meaning a four-CPU node connects to other nodes through routers. The system has no back or midplane bus. The computer's design gives it a two-fold bandwidth advantage over the previous generation making it particularly useful in CFD and large FEA simulations.

"An advantage of IA-64 computers is that software will be ready when systems come online because it has already undergone the migration from vector to scalable Risc computing," says Kremenetsky. Itanium is a Risc processor so it will not require much tuning for existing applications." Initial releases of MCAE software for IA-64 may be 'compiler optimized' only, meaning it will run satisfactorily and with potential for higher speed. Tuning will take better advantage of the processor's three levels of cache.