Intel's Itanium chip marks a new direction in the world of CPUs. Unlike earlier CISC processors such as the x86 and RISC designs such as the Sun Sparc, the Itanium is a Very Long Instruction Word (VLIW) processor. VLIW processors read instruction strings, or "words" that are composed of multiple instructions. Several specialized single-purpose processors have used the VLIW architecture, but the Itanium chip marks its first use in a general-purpose processor.
Moving the Itanium away from the x86 architecture eliminates the floating-point weaknesses that have long plagued the x86 family. This difference lets the Itanium deliver better performance than the x86—but at the cost of compatibility. The Itanium can run 32-bit x86 programs only in a performance-challenged emulation mode. In design, the Itanium more closely resembles a high-end RISC processor than it does an x86 predecessor. However, one big difference between the Itanium and the current technology that modern RISC processors use is the Itanium's use of enhanced parallel-processing techniques. Don't confuse this type of parallel processing with the parallel processing that multiple-processor SMP systems such as the Xeon offer. The Itanium's parallelism comes from the CPU's ability to process more than one instruction at a time—a task most RISC systems do poorly. Intel's name for this new parallel-processing design is Explicitly Parallel Instruction Computing (EPIC). Itanium's EPIC architecture can process up to six instructions in parallel per clock cycle. The ability to execute multiple instructions per cycle makes traditional speed measurements based solely on clock speed misleading for the Itanium processor.
Unlike the Pentium, the EPIC architecture removes the need for the CPU to perform complex, out-of-order processing to obtain speed optimizations. Instead, the job of parallelizing the machine instruction scheme falls to the compiler. A compiler reads in the program source code and creates executable instructions for the processor to perform. This means that the compiler must determine the dependencies of each instruction as well as which instructions should run in parallel. This switch of responsibilities promises to make the processors simpler without the need for an instruction scheduler or hidden registers. However, it also means that the CPU's efficiency largely depends on the compiler's ability to optimize code for parallel processing.
Another Itanium feature that's closely related to parallelism is called prediction. Prediction is a compiler-based technique of looking ahead to make more accurate predictions of which code branches the program will actually use. Today's processors, such as the Pentium, use branch prediction. In branch prediction, the processor spends part of its time performing calculations both for branches of related code that will be performed next and for unnecessary branches. The Itanium's compiler-based prediction lets it make better predictions about which branches the application will use, reducing unneeded calculations and letting the processor operate more efficiently.
Although the second-generation Itanium chip runs at a modest (by today's standards) 1GHz, the Itanium design is capable of 6 gigaflops (6 billion operations per second). The Itanium 2 has 32KB of on-chip Level 1 (L1) and 256KB of L2 cache, and it will be able to access up to 3MB of outboard L3 cache. It has 4 integer units, 2 floating-point units, and 328 registers to store numbers and instructions. The processor uses a new Slot M motherboard interface, and the system bus is 128 bits wide and runs at 400MHz. Intel builds the Itanium 2 using .18 micron die-set. (Essentially, the smaller the die-set, the better the performance.) Because it was designed with an eye toward the high-end supercomputing platform, the Itanium supports up to 512-way SMP servers.