Digital Semiconductor Alpha 21064A Microprocessor Product Brief January 1996 Description Digital's Alpha 21064A microprocessor is a member of the 21064 family of microprocessors, which all implement Digital's Alpha architecture. The Alpha 21064A microprocessor is a 0.5-m, CMOS-based, superscalar, super-pipelined processor using dual instruction issue and frequencies of either 200, 233, 275, or 300 MHz. The Alpha architecture is a 64-bit RISC architecture designed with emphasis on speed, multiple instruction issue, multiple processors, and multiple operating systems. Features TM * 64-bit, fully pipelined, advanced RISC architecture supported by multiple operating systems - Microsoft Windows NT - Digital UNIX (not supported by 21064A-275-PC) - OpenVMS (not supported by 21064A-275-PC) * Sustained leadership performance - 200-, 233-, 275-, and 300-MHz operation - Superscalar (dual issue) - Peak instruction execution rate of 400, 466, 550, or 600 million instructions per second - 0.5-m CMOS technology * Pipelined floating-point unit - IEEE single- and double-precision, VAX F_floating and G_floating, longword and quadword data types - Improved floating-point divide * Memory management - Demand-paged - 12-entry instruction-stream translation buffer (TB) >Eight entries map 8KB pages >Four entries for 4MB pages - 32-entry data-stream TB, each entry maps 8KB, 64KB, 512KB, or 4MB pages * Main memory physically addressed up to 16GB * Selectable data bus width of 64 bits or 128 bits * Onchip write buffer with four 32-byte entries * 3.3-V chip I/O supply voltage with chip interface directly to 5-V logic * Internal clock generator provides: - High-speed CPU clock - Pair of programmable system clocks (CPU/2 to CPU/17) * Software compatible with Alpha 21064 and Alpha 21066 microprocessors * External cache memory supports: - Programmable size (256KB to 16MB) - Programmable latency 3 to 16 CPU cycles - Flexible cache coherency * Privileged architecture library code (PALcode) supports: - Optimization for multiple operating systems - Flexible memory-management implementations - Multi-instruction atomic sequences * Advanced packaging technology for superior thermal-management performance * Onchip 16KB, direct-mapped, writethrough data cache with parity * Onchip 16KB direct-mapped instruction cache with parity * Improved branch prediction logic * Serial ROM (SROM) interface - Loads instruction cache after reset - Allows software-controlled serial port after initialization * Onchip parity and ECC generators and checkers * Chip- and module-level test support * Pin compatible with Alpha 21064 Microarchitecture The Alpha 21064A microprocessor uses a 7-stage pipeline for integer operate and memory reference instructions, and a 10-stage pipeline for floating-point operate instructions. The instruction fetch and decode unit (IDU) maintains state for all pipeline stages to track outstanding register write transactions. Integer Execution Unit (IEU) --The IEU contains a 64-bit, fully pipelined integer execution data path including adder, logic box, barrel shifter, byte extract and mask, and independent integer multiplier. The IEU also contains the 32-entry, 64-bit integer register file (IRF). Figure 1 is the Alpha 21064A block diagram. The Alpha 21064A CPU contains: Instruction Fetch and Decode Unit (IDU) --The IDU includes the instruction translation buffers (ITBs) and the branch prediction logic (branch unit). The IDU performs instruction fetch, resource checks, and dual instruction issue to the IEU, LSU, FPU, or branch unit. The IDU also controls pipeline bypasses, stalls, aborts, and restarts. Floating-Point Unit (FPU) --The FPU contains a fully pipelined floating-point unit and independent divider. IEEE single-precision and double-precision floating-point data types are supported. VAX F_floating and G_ floating data types are fully supported, with limited support for the VAX D_floating data type. The FPU also contains a 32-entry, 64-bit floating-point register file (FRF). Load and Store Unit (LSU) --The LSU contains four major sections: address translation data path, which includes the data translation buffer Figure 1 Alpha 21064A Block Diagram Instruction Cache Branch History Table Tag Data Address Bus Integer Execution Unit Multiplier Shifter Instruction Fetch/Decode Unit Prefetcher Resource Conflict Adder Logic Box Integer Register File Floating-Point Unit Multiplier/ Adder Pipeline Divider Data Bus (128 Bits) PC Calculation ITB Pipeline Control Floating-Point Register File Bus Interface Unit External Cache Control Load/Store Unit Write Buffer Address Generator DTB Load Silo External System Interface Data Cache Tag Data (DTB); load silo; write buffer; and Dcache interface. The LSU supports all integer and floating-point load and store instructions, including address calculation and translation, and cache control logic. Instruction Cache (Icache)--The Icache contains 16KB with parity protection. It is a direct-mapped, physically addressed cache with 32-byte blocks. Data Cache (Dcache) --The Dcache contains 16KB with parity protection. It is a write-through, read-allocate, direct-mapped, physically addressed cache with 32-byte blocks. External Cache --The 21064A supports an optional external flexible cache built from off-the-shelf SRAMs. The external cache is a write-back, read-allocate cache with 8-byte blocks. This allows each implementation to select its own external cache speed and configuration. The programmable interface supports cache sizes from 256KB to 16MB and latency of 3 to 16 CPU cycles. Virtual Address Space --The virtual address is a 64-bit unsigned integer that specifies a byte location in the virtual address space. The microprocessor implements a 43-bit subset of the virtual address space and supports a 34-bit, 16GB physical address space. Serial ROM (SROM) Interface -- This interface loads the instruction cache after reset. It has a softwarecontrolled serial port, which is activated after initialization. The SROM supports system-level design debugging. Alpha Architecture Summary As implemented in the Alpha 21064A, Alpha architecture supports: absolute or PC-relative jump using an arbitrary 64-bit register value. They can update a destination register with a return value. * A fixed 32-bit instruction size * Separate integer and floating-point registers Load and Store Instructions--These instructions can move either 32-bit or 64-bit quantities; 8-bit and 16-bit load and store operations are supported through an extensive set of in-register byte manipulations. The I/O bus directly supports byte and word operations in hardware. - Thirty-two 64-bit integer registers - Thirty-two 64-bit floating-point registers * 32-bit (longword) and 64-bit (quadword) integer data types * 32-bit and 64-bit IEEE and VAX floating-point data types Operate Instructions --Integer operate instructions manipulate full 64-bit values and include a full complement of arithmetic, compare, logical, and shift instructions. There are also three 32-bit integer operations: add, subtract, and multiply. * Memory accesses using a 64-bit virtual byte address * Privileged architecture library code (PALcode) In addition to conventional RISC operation, the instruction set provides scaled add and subtract for quick subscript calculation, 128-bit multiply for multi-precision arithmetic and division by a constant, conditional moves for avoiding branches, and an extensive set of in-register byte manipulation instructions. Instruction Set All instructions are 32 bits long and use one of four different instruction formats (see Figure 2). Each format uses a 6-bit opcode and 0, 1, 2, or 3 5-bit register fields. CALL_PAL Instructions--These instructions vector to a privileged layer of software that atomically performs both privileged and unprivileged functions. Floating-point operate instructions include four complete sets of instructions for IEEE single-precision, IEEE double-precision, VAX F_floating, and VAX G_floating arithmetic. In addition to arithmetic instructions, there are instructions for conversions between floating and integer values, including the VAX D_floating data type. Branch Instructions--Conditional branch instructions test a register for positive/negative, zero/nonzero, or even/odd conditions, and perform a PC-relative branch. Unconditional branch instructions perform either an Opcode Branch Opcode Ra Load/Store Opcode Ra Opcode Ra Operate 31 Function 26 25 Displacement Rb Displacement Rb 21 20 Function 16 15 PALcode is a privileged layer of software that automatically performs such functions as the dispatching and servicing of interrupts, exceptions, task switching, and additional privileged and unprivileged user instructions as specified by operating systems using the CALL_PAL instruction. PALcode is the only method of performing some operations on the hardware. In addition to the instructions defined by the Alpha architecture, a set of implementation-specific instructions is provided. PALcode runs in an environment with privileges enabled, instruction stream mapping disabled, and interrupts disabled. Disabling memory mapping allows PALcode to support functions such as translation buffer miss routines. Disabling interrupts allows the instruction stream to provide multiinstruction sequences as atomic operations. Memory Management The memory-management architecture provides: * A large address space for instructions and data * Convenient and efficient sharing of instructions and data * Independent read and write access protection * Flexibility through programmable PALcode support Figure 2 Alpha Instruction Formats CALL_PAL PALcode Rc 5 4 0 Adding the Alpha 21064A to Your Design Strategy The Alpha 21064A reduces the overall cost of system development, allows for easier system design, and provides the benefits of high performance by bringing the Alpha architecture to systems spanning the range from high-end desktop to enterprise-wide servers. By using the Alpha 21064A, your designs can benefit in the following ways: Easy Migration of Alpha 21064 Module The 21064A is pin compatible with the existing family of Alpha 21064 products. This allows current Alpha 21064 customers to improve performance significantly, without the inherent costs associated with redesigning motherboards. The 21064A system clock frequency is a programmable divider of 2 to 17 of the CPU clock. A system using the 21064-150 can plug in a 21064A; increase the CPU clock frequency to 200, 233, 275, or 300 MHz; and program the system clock frequency divider to keep the same system timing. This increases performance significantly. Module Flexibility for Performance Optimization The 21064A has a programmable external interface supporting a complete range of system sizes and performance levels, while maintaining peak CPU execution speed. The 21064A can address 16GB of physical memory. The bus width is selected as either 64 bits or 128 bits to provide flexible module designs. The 21064A has the logic to support board-level cache including programmable cache size--256KB to 16MB, programmable latency--3 to 16 CPU clock cycles, and multiple cachecoherency schemes. Cost versus performance trade-offs are made by selecting smaller to larger cache size, longer to shorter latency, and single versus symmetric multiprocessing. Flexibility of Design The 21064A interfaces directly to components operating at a supply voltage of either 3.3 V or 5 V. This means that you can use the standard components that best fit your applications and market demands. ISO 9002 Certification All of Digital's semiconductor manufacturing sites adhere to the stringent ISO 9002 standards, and are ISO 9002 certified. Longevity of Design All Alpha microprocessors are designed to the Alpha architecture. So, whether your design requires the high performance of the Alpha 21064A, the price/performance of the Alpha 21066, or the low power of the Alpha 21068, your designs are assured of compatibility across this family of high-performance Alpha microprocessors. Technology Leadership Designing with Digital's Alpha microprocessors puts you in a place of leadership in the industry. Your products benefit from the unbounded capacity of 64-bit technology, the builtin longevity of the Alpha architecture, and Digital's commitment to constant improvement in price/performance through the company's investment in semiconductor design and fabrication. For More Information Characteristics Characteristic Specification Power supply Vss 0.0 V Vdd 3.3 V 5% Operating temperature Tj = 90C Storage temperature range -55C to 125C Power dissipation 200 MHz 233 MHz 275 MHz 300 MHz 24 W 28 W 33 W 36 W Package 431-pin ceramic PGA with 0.10-in grid spacing and advanced thermal technology To learn more about the availability of the Alpha 21064A microprocessor, contact your local semiconductor distributor. To learn more about Digital Semiconductor's product portfolio, contact the Digital Semiconductor Information Line: 1-800-332-2717 Outside North America, call: +1-508-628-4760 Ordering Information To order this 21064 family member... Frequency Ask your distributor for... 21064A-200 200 MHz 21064-AB 21064A-233 233 MHz 21064-BB 21064A-275 275 MHz 21064-DB 21064A-275-PC 275 MHz 21064-P1 21064A-300 300 MHz 21064-EB While Digital believes the information in this publication is correct as of the date of publication, it is subject to change without notice. (c) Digital Equipment Corporation 1993, 1995, 1996. All rights reserved. Printed in U.S.A. EC-QH0RB-TE AlphaGeneration, Digital, Digital Semiconductor, OpenVMS, VAX, the AlphaGeneration design mark, and the DIGITAL logo are trademarks of Digital Equipment Corporation. Digital Semiconductor is a Digital Equipment Corporation business. IEEE is a registered trademark of The Institute of Electrical and Electronics Engineers, Inc.; Microsoft is a registered trademark and Windows NT is a trademark of Microsoft Corporation; UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited. All other trademarks and registered trademarks are the property of their respective owners.