WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 1 The AMD Athlon™ MP Processor May 2003
The AMD Athlon™ MP Processor
with 512KB L2 Cache
Technology and Performance Leadership
for x86 Microprocessors
Jack Huynh
AMD
One AMD Place
Sunnyvale, CA 94088
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 2 The AMD Athlon™ MP Processor May 2003
Introduction: Continuing Performance Leadership of
x86 Microprocessors
Founded in 1969, AMD has shipped more than 200 million PC processors
worldwide. AMD processors are the power behind desktop and notebook PCs, and a
new generation of servers and workstations. Since its introduction in 1999, the
award-winning AMD Athlon™ processor has been known as an industry leader,
enabling one of the highest system performance levels in the PC market. Since its
launch in June 2001, the AMD Athlon MP processor and computer systems based on
the AMD Athlon MP processor have won numerous awards worldwide. In all, the
AMD Athlon processor family, and systems based on such processors, has won more
than 100 awards worldwide, including PC World’s World Class Award for overall
Product of the Year in 2000 and 2002. The AMD Athlon processor family has provided
industry-leading processing power to pave the road to new levels of end-user
capability with application areas from productivity to compute-intensive workstation
applications, including digital content creation and computer-aided design. For the
server market, the AMD Athlon MP processor has also provided the reliability,
stability, and performance needed for mission-critical email, exchange, file, print,
and networking applications.
Engineering and technology leadership are key to performance leadership.
AMD’s engineering and technology leadership specific to the seventh-generation
AMD Athlon processor family includes driving innovations such as instruction set
extensions aimed at 3D applications (3DNow!™ Professional technology) at the
processor instruction level, DDR memory at the platform level, and 0.13-micron
process with copper interconnect at the process technology level. With the
introduction of the new AMD Athlon MP processor with 512KB L2 cache on 0.13-
micron process technology, AMD continues its tradition of technology innovation by
enabling high levels of delivered workstation and server performance. The discussion
that follows provides an in-depth look at how the new AMD Athlon MP processor with
512KB L2 cache on 0.13-micron process technology increases the performance
scalability of QuantiSpeed™ architecture. The differentiating features, as well as the
real-world application performance benefits of Smart MP technology (a new
multiprocessing architecture) and QuantiSpeed architecture will also be discussed.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 3 The AMD Athlon™ MP Processor May 2003
Manufacturing Technology Leadership with Leading Edge
0.13-Micron Process Technology
The new AMD Athlon MP processor, based on the core previously codenamed
“Barton,” is the newest member in the family of seventh-generation AMD Athlon
processors designed to meet the computationally-intensive requirements of software
and data-rich applications running on high-performance workstation and server
systems. The new AMD Athlon MP processor with 512KB L2 cache on 0.13-micron
process technology increases the performance scalability provided by QuantiSpeed
architecture over previous generations by delivering higher clock speeds. The
0.13-micron process technology provides the thermal headroom necessary to scale
frequency within the thermal limits of workstation and server platforms, thus
maximizing overall performance. The new AMD Athlon MP processor with 512KB
L2 cache on 0.13-micron process technology, like all AMD Athlon MP processors, is
pin compatible with AMD’s established Socket A infrastructure.
With the increased frequency scalability resulting from 0.13-micron
technology combined with QuantiSpeed architecture and Smart MP technology, AMD
continues to deliver compelling solutions for compute-intensive applications for
workstations and servers, and delivers superb integer, floating point, and 3D
multimedia performance for applications running on x86 technology-based platforms.
Smart MP Technology for Smarter Multiprocessing
With the AMD Athlon MP processor, AMD offers Smart MP technology—a
multiprocessing architecture enabling exceptionally fast performance and excellent
scalability beyond some traditional multiprocessor system architectures. AMD’s
innovative Smart MP technology is designed to optimize the execution of multi-
threaded applications empowering workstations and servers to achieve exceptional
levels of productivity and performance.
Smart MP technology consists of the following architectural features:
1) Dual point-to-point high-speed system buses
2) Innovative bus-snooping capability
3) Optimized MOESI cache-coherency protocol
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 4 The AMD Athlon™ MP Processor May 2003
Smart MP technology implements dual point-to-point high-speed system
buses that allow two processors to run independently without experiencing the
system bottleneck of sharing a common system bus. Performance delays caused by
bus arbitration and bus ownership transitions are eliminated in this architecture,
allowing each processor to perform as if it has a dedicated channel to system
resources. The split-transaction nature of the AMD Athlon system bus, combined with
its independent data and command channels, delivers a high-speed front-side bus
solution for AMD Athlon MP processors.
Bus snooping is a critical mechanism in maintaining a system’s data
coherency. While one processor is accessing memory, the second processor must
snoop or “monitor” bus activity and determine if the current memory access affects
its memory space. If so, then appropriate measures must be taken to ensure that
all affected processors and bus masters have the most accurate data available.
Smart MP technology implements a performance-oriented snooping mechanism. The
processors leverage the independent processor-to-system, system-to-processor, and
data channels of the AMD Athlon system bus to create a “virtual” snooping channel.
A processor can transfer data while simultaneously receiving snoop information, or a
processor can broadcast snoop information while simultaneously receiving data. In
some non-split-transactioned, shared-bus architectures, snooping activity is
“focused” only on the current access occurring on the shared system bus. Hence,
there may be less opportunity for concurrent data transfers that are independent of
the current snoop activity. This translates into a performance advantage for the
AMD Athlon MP processor-based system using Smart MP technology in that the data
bus is more fully utilized for transferring data as opposed to wasting time handling
snoop requests.
Smart MP technology also implements the MOESI (Modified, Owner,
Exclusive, Shared, Invalid) cache-coherency protocol. The MOESI protocol offers a
potential performance advantage over systems implementing MESI (Modified,
Exclusive, Shared, Invalid) protocol. The additional “Owner” state allows the
processor cache “owning” the data to supply data directly to the second processor
requesting access to the cached block. The requesting processor no longer has to
wait for the owning processor to write the requested data back to main memory
before the data is accessible. Instead, the owning processor supplies the requested
data directly to the requesting processor. This scheme reduces memory traffic, and
allows faster access to cached data.
With Smart MP technology, the AMD Athlon MP processor continues to deliver
breakthrough performance in the multiprocessing server and workstation markets.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 5 The AMD Athlon™ MP Processor May 2003
QuantiSpeed™ Architecture: A More Optimally Balanced
x86 Microarchitecture for Real-World Application
Performance
The microprocessor is a key component in determining the effectiveness of a
computer system to execute specific tasks in the shortest amount of time. The
amount of time required to complete specific software tasks is referred to as real-
world application performance. Application performance is the function of two
elements.
1) Clock speed of the processor, measured in megahertz or gigahertz
2) The amount of work the processor can accomplish in a given clock cycle,
measured in instructions per clock cycle (IPC)
Real-World Application Performance =
[work completed per clock cycle] x [clock speed]
= IPC x Frequency
Different approaches can be taken to optimize the processor for application
performance. AMD has worked to maintain a more balanced microarchitecture with a
shorter pipeline designed for higher IPC than competitive PC processors available in
the market. Although other competitive processors enable deeper pipelines with
fewer gates per clock to drive frequency improvements, deeper pipelines alone
translate into less work per clock cycle. This reduced work per clock cycle or reduced
IPC can only be offset by improvements in other areas, such as branch prediction
and cache hit rates. Taken to the extreme, processor performance can actually be
reduced by forcing frequency improvements at the expense of IPC improvements.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 6 The AMD Athlon™ MP Processor May 2003
This key point can be illustrated in office applications which tend to be
branch-code intensive resulting in lower performance for deeper pipelines that must
be flushed with a much greater performance penalty. As reaffirmed in the Desktop
Performance and Optimization for Intel Pentium® 4 Processor paper
(http://developer.intel.com/design/pentium4/papers/249438.htm), “Integer and
basic office productivity applications, such as Word and spreadsheet processing, tend
to have many branches in the code, thus reducing overall IPC capabilities. As a
result, the associated branch penalties and performance on these applications does
not generally scale as well with frequency and are more resistant to improvements in
micro architectural means, such as deeper pipelines.”
The AMD Athlon MP processor with Smart MP technology and QuantiSpeed™
architecture implemented on 0.13-micron technology continues to exhibit the
AMD Athlon processor family’s balanced combination of improving clock frequency
without compromising the amount of work done per clock cycle and therefore the
IPC. The end result is a processor design that produces a high IPC as well as high
operating frequencies, the optimum combination to deliver a very high level of
workstation and server performance in real-world application environments.
QuantiSpeed architecture consists of four key differentiating features that
enhance the application performance of the AMD Athlon MP processor:
1. Nine-issue, superscalar, fully pipelined microarchitecture
2. Superscalar, fully pipelined floating-point unit (FPU)
3. Hardware data prefetch
4. Enhanced Translation Look-aside Buffers (TLBs)
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 7 The AMD Athlon™ MP Processor May 2003
Figure 1: AMD Athlon™ MP Microarchitecture Block Diagram
QuantiSpeed™ Architecture: Nine-Issue, Superscalar, Fully
Pipelined Microarchitecture with High-Performance Cache
Memory Architecture, and Three Full x86 Instruction
Decoders
At the heart of QuantiSpeed architecture is a fully pipelined, nine-issue,
superscalar processor core. The AMD Athlon MP processor provides a wider execution
bandwidth of nine execution pipes when compared with competitive x86 processors
that have up to six execution pipes. The nine execution engines are comprised of
three address calculation units, three integer units, and three floating-point units.
Load / Store Queue Unit
IEU AGU
Instruction Control Unit (72-entry)
Fetch/Decode
Control
2-way, 64KB Data Cache
40-entry L1 TLB/256-entry L2 TLB
3-Way x86 Instruction Decoders
FPU Register File (88-entry)
FADD
MMX
TM
3DNow!
TM
FStore
FMUL
MMX
3DNow!
IEU
Integer Scheduler (18-entry) FPU Stack Map / Rename
2-way, 64KB Instruction Cache
24-entry L1 TLB/256-entry L2 TLB
Predecode
Cache
Branch
Prediction
Table
2-way, 64KB Instruction Cache
24-entry L1 TLB/256-entry L2 TLB
Predecode
Cache
Branch
Prediction
Table
L2 Cache
16-way,
256KB
L2 Cache
16-way,
256KB
Bus
Interface
Unit
Bus
Interface
Unit
FPU Scheduler (36-entry)
AGU IEU AGU
System Interface
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 8 The AMD Athlon™ MP Processor May 2003
In order to supply such a highly superscalar microarchitecture, the
AMD Athlon MP processor implements a large, on-chip cache architecture particularly
in the L1 cache closest to the core. The AMD Athlon MP processor’s high-
performance, on-chip cache architecture includes a dual-ported 128KB (two separate
64K) split-L1 cache with separate snoop ports, and an integrated full-speed, 16-way
set-associative, 512KB L2 cache using a 72-bit (64-bit data + 8-bit ECC) interface.
The AMD Athlon MP processor’s large integrated full-speed L1 cache is comprised of
two separate 64KB, two-way set-associative data and instruction caches, which are
much larger than the Intel Xeon processor’s L1 cache (128K vs. 8K + 12K µop). By
featuring a larger L1 cache, applications running on the AMD Athlon MP processor
perform exceptionally fast since more instruction and data information is local to the
processor. Applications exploit the larger caches by benefiting from the increased
support of instruction and data set locality. The data cache also has eight banks to
provide maximum parallelism for running multiple applications. It supports
concurrent accesses by two 64-bit loads or stores. The instruction cache contains
predecode data to assist multiple, high-performance instruction decoders. Both
instruction and data caches are dual-ported and contain dedicated snoop ports
designed to eliminate all system coherency traffic, common in systems with many
devices, from interfering with application performance.
The AMD Athlon MP processor also includes an integrated, full-speed, 16-way
set-associative, exclusive 512KB L2 cache. When the processor requests data, it first
searches the data in its L1 cache. If the processor finds the data in its L1 cache, the
result is what is known as a cache hit and the processor retrieves the data from the
low latency L1 cache. If the processor cannot retrieve the data from its L1 cache, the
processor attempts to retrieve the data in its L2 cache and once again attempts to
obtain a cache hit. In the event of a cache miss, the processor must then request
this data from the slower system memory. With the additional 256KB L2 cache over
previous AMD Athlon MP processors, the AMD Athlon MP processor with 512KB L2
cache increases the performance of server applications such as email, exchange, file,
print, and networking applications by keeping more frequently accessed instructions
and data close to the CPU. Depending on the environment, larger L2 caches can
greatly benefit server and workstation applications that demand large datasets such
as database and messaging applications. Higher set-associativity increases the hit
rate by reducing data conflicts. This translates into more possible locations in which
important data can reside in the L2 cache memory instead of system memory. With
an exclusive cache architecture, the contents of the L1 caches are not duplicated in
the L2 cache. This enables 512KB of L2 cache and 128KB of L1 cache for a total
usable storage space of 640KB.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 9 The AMD Athlon™ MP Processor May 2003
The AMD Athlon MP processor cache architecture also supports error
correction code (ECC) protection. With these cache architecture features, the
AMD Athlon MP processor is designed to provide reliable, high-performance
computing.
When executing software, a processor begins by decoding the program’s
instructions and translating them into operations (or Ops) that the microprocessor
can execute. In order to continually feed the execution engine with data, the
AMD Athlon MP processor includes three x86 instruction decoders. Each decoder is
capable of decoding three instructions per clock cycle. In comparison, the Xeon
processor is designed to decode only one instruction per clock cycle with the
resource of only one x86 instruction decoder. Thus, the Xeon processor has only
one-third the maximum theoretical decode bandwidth of the AMD Athlon MP
processor. The decode bandwidth of the AMD Athlon MP processor enables the
processor to advantageously utilize the execution bandwidth capabilities of
QuantiSpeed architecture, thereby improving IPC.
QuantiSpeed™ Architecture: Superscalar, Fully
Pipelined x86 Floating-Point Unit (FPU)
The AMD Athlon MP processor offers one of the most powerful, architecturally
advanced floating-point units (FPU) delivered in an x86 microprocessor. The
AMD Athlon MP processor’s three-issue, superscalar floating-point capability is based
on three pipelined, out-of-order floating-point execution units, each with a one-cycle
throughput. Using a data format and single-instruction multiple-data (SIMD)
operations based on the MMX™ instruction model, the AMD Athlon MP processor can
deliver as many as four 32-bit, single-precision floating-point results per clock cycle.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 10 The AMD Athlon™ MP Processor May 2003
FPU Microarchitecture
Three separate execution units in the AMD Athlon MP processor’s floating-
point pipeline support x87 floating-point instructions, MMX instructions, and
3DNow!™ Professional technology instructions. The three execution units are:
1) Fstore—This is the floating point load/store pipeline that handles FP
loads, stores, and miscellaneous operations.
2) Fadd—This is the adder pipeline that contains 3DNow! Professional
technology, add, MMX ALU/shifter, and FP add execution units.
3) Fmul—This is the multiplier pipeline that contains an MMX ALU, MMX
multiplier, reciprocal unit, FP, 3DNow! Professional technology instruction
multiplier, and support for FDIV instructions.
In addition to its superscalar design, the AMD Athlon MP processor’s FPU is
super pipelined. This technique supports higher clock frequencies and enables the
FPU to process complex floating-point instructions more quickly and deliver high
overall floating-point instruction throughput. In comparison, the FPU of the Xeon
processor only offers two execution units, one for both Fadd and Fmul and one for
Fstore. Thus, as an example, the AMD Athlon MP processor can do one floating-point
addition AND one multiplication per clock cycle, while the Xeon processor can only do
one multiplication OR one addition per clock cycle. The seventh-generation FPU of
the AMD Athlon MP processor incorporates other features such as a 36-entry
instruction scheduler and an 88-entry register file for independent, superscalar, out-
of-order, speculative execution of floating-point instructions. With three separate
execution units, the AMD Athlon MP processor’s superscalar FPU can boost the
performance of floating point-intensive applications varying from commercial
applications such as 3D modeling and CAD to consumer applications such as digital
video and audio editing for workstations.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 11 The AMD Athlon™ MP Processor May 2003
3DNow!™ Professional Technology: FPU Innovation of the
AMD Athlon™ MP Processor Core
The AMD Athlon MP processor with 3DNow! Professional technology adds 51
new instructions to the enhanced 3DNow! technology supported by the original
AMD Athlon processor family. These 51 new instructions, along with the SIMD
integer additions already included in enhanced 3DNow! technology, are compatible
with Intel’s SSE technology. Table 1 provides a breakout of the 3DNow! technology
instruction set evolution.
Table 1: AMD Processor Support of SIMD Instruction Extensions to the x86 Instruction Set
Architecture
AMD Processor: AMD-K6®-2
Processor
AMD Athlon™
Processor AMD Athlon MP Processor
3DNow!™
technology
version
supported:
3DNow!™
technology
Enhanced 3DNow!
technology
3DNow! Professional
technology
Description of
instructions
supported:
Original 3DNow!
technology
extensions
3DNow! technology
plus 19 MMX
extensions (part of
SSE) plus five
DSP/communications
extensions
Enhanced 3DNow!
technology plus 51 SSE
extensions (completing
SSE support)
3DNow! technology and SSE are largely complementary architectural
enhancements. By implementing them in a variety of ways, software developers are
able to determine how they can utilize the advanced architectural capabilities
enabled by SIMD instruction set extensions. Examples of applications most able to
benefit from the use of these instruction set extensions include speed recognition,
video encoding/decoding, and 3D graphics generation.
Many current software applications that are SIMD-optimized use different
code paths to benefit from 3DNow! technology or SSE, depending on the processor
architecture on which these applications are executed. AMD processor architectures
preceding the AMD Athlon MP processor only supported 3DNow! or enhanced 3DNow!
technology, which yielded the following three code path scenarios for developers:
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 12 The AMD Athlon™ MP Processor May 2003
1) Software optimized exclusively for AMD processor architectures with
3DNow! technology use their 3DNow! technology-optimized code path on
AMD processors supporting 3DNow! technology.
2) Software optimized for both AMD processor architectures with 3DNow!
technology, and other x86 industry processor architectures supporting
SSE, use their 3DNow! technology-optimized code path on AMD
processors supporting 3DNow! technology.
3) Software optimized exclusively for other x86 industry processor
architectures supporting SSE use the non-optimized code path on AMD
processor architectures.
With the advent of 3DNow! Professional technology, the AMD Athlon MP
processor can seamlessly allow SIMD-optimized software in the third scenario above
to recognize SSE support and run the optimized code path for increased
performance. The recognition of SSE support in 3DNow! Professional technology is
performed automatically by software applications that use industry standard feature
flags, provided in the CPUID instruction to automatically recognize SSE support and
run the optimized code path. This means that with 3DNow! Professional technology’s
support for both 3DNow! and SSE technologies, the AMD Athlon MP processor is able
to take advantage of the performance gains offered by SIMD-optimized software
applications.
Not only is the AMD Athlon MP processor designed to benefit from existing
software applications supporting 3DNow! and SSE technologies, but in the future,
software developers should have the ability to utilize the strength of both 3DNow!
and SSE technology when optimizing code paths for AMD processor architectures
that support 3DNow! Professional technology. The AMD Athlon MP processor enables
this advanced level of SIMD optimization by allowing 3DNow! and SSE instructions to
be executed in the same code path.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 13 The AMD Athlon™ MP Processor May 2003
QuantiSpeed™ Architecture: Hardware Data Prefetch
To further enhance processor IPC and, therefore processor performance, the
AMD Athlon MP processor also uses hardware data prefetch technology. This
hardware data prefetch technology observes memory accesses, looks for regular
access patterns, and speculatively fetches the cache line with the data into the
processor’s L2 data cache in advance of the actual data access, therefore reducing
the average latency seen by the processor in accessing memory. In the past, data
prefetch was supported through the instructions introduced in 3DNow! and SSE
technologies. However, for the processor to take advantage of this capability,
software applications had to be specifically optimized with the 3DNow! and SSE
instructions. The AMD Athlon MP processor is designed to automatically optimize
performance on existing software that has not previously been optimized using the
hardware data prefetch instructions supported by 3DNow! Professional technology.
Benefits of the AMD Athlon MP processor’s hardware data prefetching are
observed more in high-end, data-intensive server applications that access larger
arrays of data. Performance also benefits by not occupying processor instruction
execution bandwidth required by software prefetching instructions. The optimization
is most effective when coupled with high-bandwidth system memory transfer
capability, now available to the processor by platforms such as those optimized to
support DDR memory.
QuantiSpeed™ Architecture: Exclusive and Speculative
Translation Look-aside Buffers (TLBs)
The AMD Athlon MP processor features advanced, two-level Translation Look-
aside Buffer (TLB) structures for both instruction and data address translation. The
AMD Athlon MP processor’s Level 1 (L1) Instruction TLB (I-TLB) holds 24 entries, the
L1 Data TLB (D-TLB) holds 40 entries, and the L2 I-TLB and D-TLB each hold 256
entries.
To reduce the incidence of TLB entry conflicts, the L1 and L2 TLB structures
adopt an exclusive architecture design. With an exclusive TLB architecture, the L1
TLBs can contain entries that are not duplicated in the L2 TLBs, enabling the
combination of L1 TLB and L2 TLB sizes for a larger total available entry space on
both the instruction and data TLBs. By reducing the number of conflicts caused by
holding more TLB entries within the processor, performance increases on high-end,
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 14 The AMD Athlon™ MP Processor May 2003
data-intensive applications that encounter instruction sequences that may no longer
have to wait for TLB entries to be reloaded during execution.
The TLB structures of the AMD Athlon MP processor also have the ability to
enter data TLB misses in the TLBs speculatively. The AMD Athlon MP processor
allows TLB entries to be written speculatively before the first instruction is
completed, while preserving proper instruction execution ordering that removes the
serialization effect and results in improved system performance.
Conclusion: Technology and Performance Leadership of
x86 Microprocessors
With these key differentiating features of the new AMD Athlon MP processor
with QuantiSpeed architecture…
0.13-micron process technology—Provides further thermal
headroom necessary to scale frequency within the thermal limits of
workstation and server platforms for AMD processors, thus maximizing
overall performance
512KB L2 cache—Increases the performance of server applications
such as email, exchange, file, print, and networking applications by
keeping more frequently accessed instructions and data close to the
CPU.
Smart MP Technology:
Dual point-to-point high-speed system buses—Allows two
processors to run independently without the overhead of sharing a
common system bus
Innovative bus-snooping capability—Offers high-speed
communication between processors in a multiprocessing system
Optimized MOESI cache-coherency protocol—Reduces memory
traffic and allows faster access to cached data.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 15 The AMD Athlon™ MP Processor May 2003
QuantiSpeed™ Architecture:
Nine-issue, superscalar, fully pipelined microarchitecture—Provides
a wide executing bandwidth to improve overall productivity
Superscalar, fully pipelined FPU—Increasing performance of
floating point-intensive applications while offering 3DNow! Professional
technology support
Hardware data prefetch—Increasing performance on high-end
software applications using high-bandwidth system capability,
especially with DDR memory
TLB enhancements—Increasing performance of high-end, data-
intensive applications
…AMD continues to accelerate technology innovations while meeting the
computationally intensive requirements of software applications including:
3D applications—3D modeling, animation, digital visualization, etc.
Multimedia/digital content creation applications—Photo and
video editing, video encoding and decoding, image compression, soft
DVD, MP3 encoding and decoding, etc.
High-end applications—Digital publishing, speech recognition, CAM,
digital prototyping, etc.
IT Infrastructure applications—Web servers, file and application
servers, messaging and database servers
With compelling performance across these and a number of other
applications, the AMD Athlon MP processor with 512KB L2 cache implemented on
0.13-micron technology and featuring Smart MP technology continues to increase the
performance scalability provided by QuantiSpeed architecture by delivering high
clock speeds and excellent processor performance over previous generations. The
AMD Athlon MP processor with 512KB L2 cache and Smart MP technology continues
in the tradition of the AMD Athlon processor family by providing compelling levels of
delivered system performance for today’s and tomorrow’s applications.
WW
WW
HH
HH
II
II
TT
TT
EE
EE
PP
PP
AA
AA
PP
PP
EE
EE
RR
RR
Page 16 The AMD Athlon™ MP Processor May 2003
AMD Overview
Founded in 1969 and based in Sunnyvale, California, AMD (NYSE: AMD) is a global
supplier of integrated circuits for the personal and networked computer and communications
markets with manufacturing facilities in the United States, Europe, Japan, and Asia. AMD, a
Standard & Poor’s 500 company, produces microprocessors, Flash memory devices, and
silicon-based solutions for communications and networking applications.
© 2003 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow logo, AMD Athlon, and combinations thereof, and 3DNow!, QuantiSpeed, and
AMD PowerNow! are trademarks and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc.
Pentium is a registered trademark and MMX is a trademark of Intel Corporation in the United States and/or
other jurisdictions. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium.
Other product and company names used in this publication are for identification purposes only and may be
trademarks of their respective companies.