www.infineon.com/dsp
Never stop thinking.
Architecture Overview
CARMEL DSP Core Technical
Overview Handbook
About this Document
This document was created with Adobe FrameMaker 5.5.6 at Infineon Technologies North America Corp., 1730
North First Street, San Jose, California 95112, USA. Revision number and date are shown on each page. This docu-
ment is not controlled, meaning that no distribution list is maintained and the reader is responsible for ensuring that he/
she is not using an obsolete version. Please e-mail your comments, corrections, and feedback to: editor@infineon.com
Revision History
Copyright © 2000 Infineon Technologies Corp.
All Rights Reserved.
Release Version Release Date Comments
1.0 06/01/00 Preliminary release
V1.0 2000-06-01
CARMEL™ Technical Overview 4 V1.0 2000-06-01
Attention please!
As far as patents or other rights of third parties are concerned, liability is only assumed for components, not for
applications, processes, and circuits implemented within components or assemblies.
This information describes the type of component and shall not be considered as assured characteristics.
Terms of delivery and rights to change design reserved.
Due to technical requirements, components may contain dangerous substances. For information on the types in
question, please contact your nearest Infineon office.
Infineon Technologies Corp. is an approved CECC manufacturer.
Packing
Please use the recycling operators known to you. We can also help you get in touch with your nearest sales
office. By agreement, we will take packing material back, if it is sorted. You must bear the cost of transport.
For packing material that is returned to us unsorted or which we are not obligated to accept, we shall have the
right to invoice you for any costs incurred.
Components used in life-support devices or systems must be expressly authorized for such purpose!
Critical components1) of Infineon Technologies Corp. may only be used in life-support devices or systems2) with
the express written approval of Infineon Technologies Corp.
1A critical component is a component used in a life-support device whose failure can reasonably be expected to
cause the failure of that life-support device or system, and/or to affect the safety or effectiveness of that device
or system.
2Life-support devices or systems are intended: (a) to be implemented in the human body, or (b) to support and/
or maintain human life. If they fail, it is reasonable to assume that the health of the user may be endangered.
Table of Contents Page
Table of Contents
CARMEL™ Technical Overview 5 V1.0 2000-06-01
1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2The CARMEL™ DSP Core Product Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 CARMEL Synthesizable Product Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
The CARMEL Core Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Data Address Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Program and System Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CARMEL Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
System Interconnect Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
System Peripheral And Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 CARMEL Product Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Emulation Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3The CARMEL Core Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Core Functional Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Core Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 The Execution Unit (EU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Execution Unit Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Arithmetic Logic Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Multiply-Accumulate Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Barrel Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Exponent Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 The Address Unit (AU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 The Program Control Unit (PCU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 The A and B Data Memory Interface (ABIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4CARMEL Programming Model And Instruction Set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 CARMEL Core Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Operand Data Types and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Operand Data Registers and Memory Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Data Operand Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Instruction Set Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Instruction Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Instruction Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Instruction Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Configurability And The Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5The CARMEL System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1 Representative External Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 A Representative CARMEL-Based System-on-a-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 The CARMEL Core Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 CARMEL Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
CLIW Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Data Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5 System Interconnect Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Core-to-System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
FPI Bus, Clock and System Configuration Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Emulation Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
GPIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.6 System Peripheral And Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
FPI-Bus System Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6Example Programs And Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Single-Sample Real Non-Symmetrical FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Block Real Symmetrical FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Introduction
CARMEL™ Technical Overview 6 V1.0 2000-06-01
1Introduction
This CARMEL Core is the first in a family of 16-bit, fixed-point digital signal processing (DSP) cores that target advanced
communications and consumer applications. Its modular design architecture allows for complete system-on-a-chip (SoC)
implementations using advanced Electronic Design Automation (EDA) methodologies in an integrated development
environment. The CARMEL Core delivers high performance and an efficient DSP instruction rate without sacrificing power
dissipation and code compactness requirements. The patented Configurable Long Instruction Word (CLIW™) architecture
sets the core apart by allowing Very Long Instruction Word (VLIW) performance at the low cost of traditional DSP
architectures. CARMEL’s memory-oriented design is characterized by its flexible instruction set and high-performance
arithmetic and addressing units. Instructions can be customized through CLIW. CLIW provides a high degree of parallelism
with ability to simultaneously generate four addresses, perform four arithmetic operations and two transfers.
The important design factors in modern communication and consumer products are: Performance, Power and Price. These
design factors apply to the system’s key component, the digital signal processor. These criteria require that DSP cores and
their system solutions must give the designer all the choices to optimize those same factors in a system-on-a-chip for their
particular application. The CARMEL Core has been created to enable those design choices. It can be configured for your
application in instruction set as well as in hardware.
The CARMEL™ DSP Product Line provides high performance in the system’s DSP tasks, control tasks and system
performance because the memories, peripherals and input/output operate in a balanced way with the processing elements.
High performance is not just in the core processor but throughout the system. Low system power comes through careful
overall design. System price is low due to compact modular designs with just the needed functions, and shortened
development times through the use of a complete suite of tools with large libraries of both software functions and
synthesizable circuit modules.
The sections that follow provide a detailed summary of the CARMEL™ DSP Product Line with specifics as to how these
benefits are achieved.
Six 40-bit Accumulators
MAC 2ALU 2
Shifter
Exponent
ALU 1 MAC 1
Figure 1-1. The Powerful CARMEL Core Execution Unit
The CARMEL™ DSP Core
CARMEL™ Technical Overview 7 V1.0 2000-06-01
2The CARMEL™ DSP Core
The CARMEL brand name applies specifically to the core digital signal processor available from Infineon Technologies and
others as a licensable Register Transfer Level (RTL) description or macro circuit designs. This description when used with a
RTL synthesizer produces the central portion of a processor that executes the CARMEL Instruction Set. The CARMEL
Product Line also includes all of the shaded items shown in Figure 2-1 to provide a complete design solution for a System-
on-a-Chip (SoC) integrated circuit. These items are peripheral circuit RTL description modules as well as a full set of software
development tools for writing, testing and debugging programs.
All modules and software are added into the design flow of the licensee for possible use with other system element designs
and tools.
2.1 CARMEL Synthesizable Product Modules
The CARMEL product modules fall into the four functional groups shown in Figure 2-2. The signal processing core connects
directly with its program and data memories. Connections to other system peripherals, larger memories and input/output
devices are made with the Flexible Peripheral Interconnect Bus (FPI Bus). The core itself interfaces to the bus through units
that also provide interrupt processing and DMA transfers. A block diagram of a representative system-on-a-chip using the
CARMEL product modules is in Figure 2-3.
For this CARMEL Core, the data precision is 16 bits fixed-point in an address space of 16 bits. Instructions are 24 bits in an
address space of 24 bits. For higher performance these fundamental dimensions are increased in some portions to 32 and
40 bits for data precision, to 24-bits in a data I/O space and to 48 and 144 bit instruction formats.
All the CARMEL product modules are designed to be device technology independent for static circuit implementations with
all transfers on positive clock transitions of a single phase clock. This conservative easy-to-synthesize design style and the
core’s eight-stage pipeline allow high-performance clock rates of at least 250 MHz at 1.8 Volts in a 0.18µm process for typical
systems-on-a-chip.
The CARMEL Core Module
This core is a uniquely crafted matching of a powerful highly modular hardware architecture with an equally modular
instruction set architecture. The modular core hardware allows for choices in the complexity of DMA, interrupt control and
data memories, for example. The instruction set modularity provides the performance of longer-word multiple parallel
computations when needed or the economy of short-word simple instruction execution when it is not.
The core itself handles arithmetic processing of data, generating addresses, data memory interface, and program and
system control including address generation for program memories and instruction issuing.
Figure 2-1. The CARMEL Product Line Design Solution
Program Memories
RTL
Logic
Synthesizer
CARMEL
Core
System-on-a-Chip
User
Software
User Design
RTL
Description
Data Memories
Peripherals Emulator
CARMEL DSP
Library
CARMEL
Peripherals
RTL
CARMEL
Software
Development
Tools
CARMEL
Core RTL
Description
The CARMEL™ DSP Core
CARMEL™ Technical Overview 8 V1.0 2000-06-01
The following sections detail the specifications and architectures for these core functions.
Arithmetic Functions
The six computation sub-units of two 16 x 16-bit MACs, two 40-bit ALUs, a 40-bit barrel shifter and a 40-bit exponent unit with
six 40-bit accumulators shown in Figure 1-1 provide a full function set:
16, double-16, 32 and 40 bit data types plus single-bit manipulations
SIMD operations on double 16-bit data operand pairs within a single ALU
Logical and arithmetic shifts, extract, insert and logical operations
Minimum and maximum operations with address register and Viterbi back trace register
Fractional and integer arithmetic and a normalizing exponent operation for block floating-point
Limiting, saturation, automatic FFT scaling and rounding modes of nearest and convergent
Multiply for signed and unsigned operands and superscalar parallel adder/subtractor accumulation
Multistate conditional execution
Iterative division support
Two secondary accumulators for fast context switching with a bank exchange instruction
Additionally, application specific accelerators or co-processors (e.g. convolutional encoders) are easily integrated.
Data Address Generation
The 30 registers of the address register file, the four address ALUs and a stack ALU provide full data memory addressing
capability:
Dual data memories A and B in a single 64k words address space
Four buses for up to four separate memory banks which may be odd and even addressed, single and/or dual port
Four simultaneous addresses per instruction cycle with four address modifications: linear, modulo (aligned), special
modulo (non-aligned) and bit-reverse
Independent base, offset or index, displacement, limit and modulo registers including secondary registers for fast
context switching with a bank exchange instruction
Modes for direct (immediate) or indirect addressing, registers or memory, with or without post modification or indexing,
and single or double word access
Memory conflicts resolved by automatic wait-state insertion
CARMEL Core
System Peripherals
and Memories System
Interconnect
Figure 2-2. CARMEL Product Line Functional Modules
CARMEL Memories
CARMEL-Based System-on-a-Chip
External
System
Interfaces
The CARMEL™ DSP Core
CARMEL™ Technical Overview 9 V1.0 2000-06-01
Program and System Control
The program counter, instruction pipeline with decoding, loop counters and interrupt control provide the following advanced
programming features:
Single-cycle execution with an 8-stage pipeline
24-, 48- and 144-bit instruction words that match code density to processing load
Configurable Long Instruction Word (CLIW™) of 96 bits defines up to six custom parallel operations for user algorithms
CLIW operations reusable with different 48-bits of operands and different execution conditions
Superscalar parallel execution of two 24-bit instructions (MIMD)
Flexible multiple-state conditional execution strategies for computation instructions
Full set of condition branching program control instructions
Zero over-head loop and repeat instructions with four nesting levels
Exception processing: 240 vectored interrupts, trap and breakpoints for emulation support and full stack operations
Fast context switching with a register bank exchange instruction and conditional execution load instruction
Single and parallel move operations for data and program with memories and I/O spaces
Figure 2-3. A Representative System-on-a-Chip Using the CARMEL Product Modules
CARMEL Core
Address
Unit
Execution
Unit
Program
Control
Unit CLIW Memory Bus
RAM
Program Memory Bus
RAM
A RAM B RAM
Data Memories
Interrupts
DMA
IO Bus
Instruction Memories
Memory I/F
FPI Bus
Interrupt
FPI Bus
I/O
DMA
Host I/F
JTAG I/F
System
Memories
SRAM,
DRAM,
ROM,
Flash
ROM
ROM
Co-
Processor
Timers
Peripheral
Controllers
CARMEL MemoriesSystem Interconnect
System Peripherals
and Memories
CARMEL-Based System-on-a-Chip
Core Emulation Unit
Core-to-System I/F
Memory Interface
External
Memory
Interface
Host
Interface
JTAG
Interface
Peripheral
Ports
The CARMEL™ DSP Core
CARMEL™ Technical Overview 10 V1.0 2000-06-01
CARMEL Memory Modules
The sophisticated memory control features built into the CARMEL Core mean that straightforward synchronous memory
designs can be used for the core memories. The three CARMEL memories can have the following characteristics:
Program Memory:
SRAM or ROM
48-bit data
CLIW Program Memory which is an optional memory
SRAM or ROM
96-bit data with 10-bit address selection
Data Memories:
SRAM
16-bit data words with 16-bit address selection into regions that are multiples of 2k words (11 bits)
Three configurations: Single-port, dual-port with even/odd addressing, true independent dual-port
System Interconnect Modules
System Peripheral Modules can be thought of as arrayed along a System Interconnect extending from the CARMEL Core.
This interconnect is the basic Flexible Peripheral Interconnect (FPI) Bus augmented with conforming DMA and Interrupt
control signals. The basic FPI Bus has its own controller while the interrupt and DMA controllers are part of the system
interface to the CARMEL Core. The core emulation unit also acts as an interface controller to the core from the system
interconnect for its special function in the system. The GPP general purpose port is for eight simple programmed I/Os.
The FPI Bus
FPI Bus Control: 250 MB/second data transfers on a 16-bit data bus in a 16 M address space with master/slave
operation. This enables high transfer bandwidths without slowing core program execution.
System Configuration Control: Distributes system configuration parameters that are determined at reset time
System Clock & Reset: Distributes system clock and reset signals based on parameters determined at reset time
The CARMEL Core-to-System Interface
DMA Controller: A modular design with up to eight channels in three priority groups. It makes independent data transfers
between core data memories and FPI-Bus peripherals as well as peripheral-to-peripheral transfers without processor
intervention.
Interrupt Controller: Up to 240 prioritized maskable vectored interrupts
Core-to-FPI Interface (FPIU): Provides data queueing and address generation for the core I/O operations on the FPI bus
CARMEL Core Emulation Unit
Its intimate connection to the core permits off-line breakpoint intervention and analysis with an emulation monitor
program
System Peripheral And Memory Modules
These are proven designs by Infineon for common system peripherals that reside on the FPI Bus System Interconnect.
External Interfaces
External Memory Bus Interface: Maps external memories of various types with programmable choices of timing and
address selection
Host Interface: A full function buffered interface to big or little endian microprocessors
JTAG Emulation Interface: A companion to the core’s Emulation Unit that allows an off-chip host to control emulation
and the on-chip debug support
Controllers
Programmable Parallel and Serial Interfaces: Generic designs that accommodate a wide variety of synchronous and
non-synchronous data rates and configurations
Timer: Multiple counters for real-time system control, time-outs, etc.
FPI-Bus Memories
The CARMEL™ DSP Core
CARMEL™ Technical Overview 11 V1.0 2000-06-01
Larger, slower, denser memories
2.2 CARMEL Product Development Tools
Infineon’s established CARMEL DSP Alliance initiative provides, through third-party suppliers, a complete suite of hardware
and software tools to aid the system designer. These assist software writing and debugging as well as prototyping, testing
and debugging of system hardware.
Software
Available as a complete integrated program development environment with uniform interfaces on PC workstations. A modern
editor with a macroassembler for writing and linking programs that are then run on the instruction-cycle and bit-true accurate
software simulator. Debugging is easy with breakpoint capability and optimization is enhanced with profiling and resource
utilization views. As a further aid, CARMEL models run on a high-level DSP design environment tool suite.
An ANSI C Compiler optimized C language code takes advantage of the unique CARMEL features. Algorithm libraries are
available in both C language and assembly language routines for common DSP applications and functions.
A flexible Real-Time Operating System (RTOS) adds just the real-time controls needed for the multiple tasks in an individual
application. This approach keeps response times fast and RTOS program size small. The RTOS also facilitates task-level
debugging.
Emulation Hardware
These tools permit running programs in real-time and within the actual application’s hardware system.
The CDEV Development Chip is a CARMEL Core with large on-chip memories, a typical DMA and interrupt controller, an
emulation unit and many input/output interfaces for off-chip memories, controllers and additional peripherals.
The CAREB Evaluation Board puts the CDEV chip into a plug-in, ready-to-run system with additional memories, peripherals
and standard interfaces for easy test and debugging of programs. Field programmable gate arrays (FPGAs) speed
prototyping of hardware designs for system elements on the FPI bus.
The CARMEL Core Architecture
CARMEL™ Technical Overview 12 V1.0 2000-06-01
3The CARMEL Core Architecture
The system aspects of the CARMEL Product Line are significant, but it is the CARMEL Core that is most important. The
CARMEL Core does the processing and is at the center of the system; the system is defined largely by how the core provides
for extending itself. It is the core that implements the Instruction Set Architecture (ISA) and the man-years of intellectual
property that this represents in applications algorithms, library functions, design verification, test routines, development
software and hardware tools.
This section describes the core’s hardware architecture and its interface to the rest of the system, while the following Section
4 describes the ISA in terms of the programming model and instruction set summary.
3.1 Core Functional Units
A CARMEL system’s high performance comes from a careful matching of the core’s four functional units as shown, with their
major components, in Figure 3-1. The Executions Unit’s six computational sub-units with accumulators can perform
massively parallel data operations in its two sides.
The A and B Data Memory Interface Unit assures available data with up to four data streams from the full sized data
memories, not just a constricted register file. The data Address Unit generates the required four address streams, created not
with simple counters, but with full special function ALUs and a large register set.
The Program Control Unit fetches appropriate sized instructions, decodes them, sequences them in the pipeline and issues
them to the other units as needed, all while imposing little time or instruction overhead itself and remaining responsive to
other real-time system events.
CARMEL Core
Address Unit (AU)
Execution Unit (EU)
Program Control Unit (PCU)
AB Data
Memory
Interface
(ABIF)
Figure 3-1. The CARMEL Core’s Functional Units
EU1
MAC 2ALU 2
EU2
Accumulators
Register Set 0-3
Address ALU 0,1
Register Set 4-7
Address ALU 2,3
Stack Pointer
Stack ALU
Program Counter
Interrupts
Loop Counters
Instruction Pipeline
Left Right
MAC 1ALU 1
Exp
Shift
The CARMEL Core Architecture
CARMEL™ Technical Overview 13 V1.0 2000-06-01
3.2 Core Interfaces
Figure 3-2 shows the primary data and address interconnections of the core’s four functional units and its six interfaces with
the remainder of the system. Secondary data and addresses pass between units on the G1 and G2 data buses.
The IO Bus interface is for the processors programmed input/output address space for registers and memory. The DMA Bus
interface provides direct access to the data memory address space. Forty-eight-bit memory goes on the Program Memory
Bus interface with optional 96-bit memory on the CLIW Memory Bus. The four ports for the A and B data memories connect
to the 136 signals of the Data Memory interface.
The System and Control interface provides for program interrupts, general purpose input/output (GPIO) and emulation units
that can govern program execution control. System signals are basic clocks and configuration determiners as shown in Table
3-1. Note from the signal names that the interface designs are simple and straightforward where data and addresses are
clocked by transitions on enable signals or by a simple distribution of the core clock.
CARMEL Core
Address Unit
Execution Unit
Program Control Unit
ABIF
Addr
Data
Ctrl
24
2
10
96
Addr
Data
Ctrl 3
23
48
CLIW
Memory
Bus
Program
Memory
Bus
B2 Addr 16
A2 Addr 16
B1 Addr 16
A1 Addr 16
A1 Ctrl 2
A1 Data 16
B1 Ctrl 2
B1 Data 16
A2 Ctrl 2
A2 Data 16
B2 Ctrl 2
B2 Data 16
Data
Memory
Buses
G2 DataG1 Data
IO Bus
16
DMA Data
IO Addr
DMA
Bus 16
DMA Addr
IO Data
16
14
Mem Config
4
IO Ctrl
5
DMA Ctrl
8
GPP
14
Interrupts
8
General
16 16
23
Emulation
Figure 3-2. CARMEL Core Functional Unit Interconnections and Interfaces
System
and
Control
The CARMEL Core Architecture
CARMEL™ Technical Overview 14 V1.0 2000-06-01
3.3 The Execution Unit (EU)
The Execution Unit is divided into two fully independent execution units: EU1 or the Left side and EU2 or the Right side, as
shown in Figure 3-3. Both share equally the data sources and destinations of the accumulators, A and B data memories,
immediate data from instructions and the other core units. EU1 and EU2 can operate in parallel using these shared
resources.
Execution Unit 1 has four computational sub-units: an Arithmetic Logic Unit (ALU), a Multiply-Accumulate unit (MAC), a Barrel
Shifter (Shifter) and an Exponent Unit (Exponent). Execution Unit 2 has only an ALU and MAC since the Shifter and Exponent
operations occur less frequently in most DSP applications. All sub-units execute in a single instruction cycle.
Execution Unit Instructions
The CARMEL Core’s high performance with highly efficient instruction programs comes from the unique way it keeps these
six computation sub-units busy when they are needed and without large instructions when they are not all needed. The core
in general uses three distinct instruction formats to get operation codes to the sub-units:
As one of up to six sub-instructions within a reusable 96-bit CLIW operations block with a 48-bit CLIW reference operand
instruction
As a 24-bit instruction with short direct and indirect operand references
As a 48-bit instruction with longer direct and indirect operand references
Two CLIW sub-instructions can go to each execution unit EU1 and EU2 with the remaining two sub-instructions for
simultaneous move operations in the total of 144 bits of instruction. Two 24-bit instructions can be fetched together and
generally execute in parallel, so both full execution units can process in one instruction cycle with just 48-bits of instruction.
A single execution unit by itself requires just a single 24- or 48-bit instruction.
The following sections describe each component of these two execution units.
Table 3-1. CARMEL Core Interface Signal Summary
Interface Function Names Signals
I/O Bus (IO) Address IOA[23:0] 24
Data IOD[15:0] 16
Control IORE, IOWE, IOWAIT, IOREG 4
DMA Bus (D) Address DA[15:0] 16
Data DD[15:0] 16
Control DRD, DWR, DREQ, DACK[1:0] 5
Program Memory Bus (P) Address PMA[23:1] 23
Data PMD[47:1] 48
Control PMWE, PMRE, PMWAIT 3
CLIW Memory Bus (CI) Address CIA[9:0] 10
Data CID[95:0] 96
Control CIRE, CIWE 2
Data Memory Buses A1/A2/B1/B2 Address A1A[15:0], A2A[15:0], B1A[15:0], B2A[15:0] 64
A1/A2/B1/B2 Data A1D[15:0], A2D[15:0], B1D[15:0], B2D[15:0] 64
A1/A2/B1/B2 Control A1RE, A1WE, A2RE, A2WE, B1RE, B1WE, B2RE, B2WE 8
System and Control General Purpose I/O GPI[3:0], GPO[3:0] 8
Data Memory Configuration MEMCFG[13:0] 14
Interrupts NMIREQ, NMIACK, INTVCT[7:0], VCTREQ, VCTACK, VCTEOI,
ERRACK 14
Emulation Breakpoint and Trace Operands and Control 23
General CLK, RESET, EXTWAIT, INTWAIT, SWAP, BOOT, PROTECT,
PMEM 8
The CARMEL Core Architecture
CARMEL™ Technical Overview 15 V1.0 2000-06-01
Arithmetic Logic Units
The ALUs are full 40-bit by 40-bit units with 40-bit outputs, yet they can operate on 32-bit long data words, 16-bit words or
even two unrelated 16-bit operands in a double data word for certain operations (Add, Sub, Min, Max). Basic operations are:
Arithmetic: Add, subtract, negate, absolute value
Comparison: Compare, test field, minimum and maximum including with a serial back trace of results for the Viterbi
algorithm and saving the minimum/maximum data address
Logical: And, Or, Not, Xor
Bit: Set, clear, test, change
Division iteration steps: High or low resolution
Ancillary: Automatic scaling, rounding
Multiply-Accumulate Units
The 16-bit by 16-bit multipliers in combination with the 40-bit accumulators provide multiply, square and multiply-accumulate
operations with these features:
Operand signs: All four possible signed/unsigned operand combinations supported
Formats: Integer or fractional with correct rounding
Mixed precision: Support for 16-bit by 32-bit multiply with aligned-accumulate instruction
ALU only operations: Add, subtract or move operations with CLIW sub-instructions
Dual accumulators: Single 40-bit accumulator may be split into two 16-bit accumulators
Barrel Shifter
The Shifter in EU1 also is a full 40-bit unit yet supports 16- and 32-bit operands as well. The basic operations are:
Logical shift and arithmetic shift with a 6-bit shift value
Insert and extract bit fields
Rotate-thru-carry by one bit for 16-, 32 and 40-bit operands
Exponent Unit
This unit determines the 6-bit shift value needed in the barrel shifter to normalize a 16-, 32- or 40-bit input operand. It
facilitates using block floating-point.
Figure 3-3. The CARMEL Core Execution Unit
EU1
Execution Unit
EU2
Memory Bus Switch
Out Data Bus Switch
Six 40-bit Accumulators
In Data Bus Switch
Data Bus Switch
Data
Memory
Interface
Immediate
Data
G2 Bus
I/O
G1 Bus
Left Right
40
16
40 40 40
16
16
16
16
16
16
16
MAC 1
Shifter
Exponent
ALU 1 MAC 2ALU 2
The CARMEL Core Architecture
CARMEL™ Technical Overview 16 V1.0 2000-06-01
3.4 The Address Unit (AU)
The Address Unit in the core generates four simultaneous 16-bit addresses for the four ports of the data memory. These
provide operand memory access for the computational sub-units in the Execution Unit and implement the stack for the
Program Control Unit. The ALU does not create addresses for DMA transfers with the data memory. However, it does
arbitrate and resolve all memory and memory bus conflicts with wait-states, including with the DMA.
The address unit works with all the operand addressing modes of the core including some registers as well as data memory.
The modes for a single 16-bit data address or for 32-bit data with two sequential addresses are:
Direct: Immediate memory address of operand or direct access of operand in a register including post modification
Indirect: Generated memory address of operand including post-modification or indexing
The are four operand address modification modes, all of which execute in a single cycle:
Linear
Bit-Reverse for the FFT
Modulo with aligned boundary addresses
Special Modulo with arbitrary non-aligned boundary addresses
The data operand addresses are generated by four special ALUs operating out of a 30 entry register file. The primary
registers are in eight sets where each set has a base register and an offset or index register. Every two sets has a modulo
and lower boundary limit register. There are six secondary registers that shadow their primary equivalents that can be
swapped with a single bank exchange instruction to provide a rapid context switch upon interrupts. Immediate values can be
used in address modifications as well as registered values. Each group of four sets has an additional fixed displacement
register for use in CLIW instructions.
3.5 The Program Control Unit (PCU)
The PCU is truly the heart of the CARMEL Core because it sends out program memory addresses, fetches instructions
including CLIW, decodes them and then pumps out instruction commands to all of the units in the proper sequence.
Sequencing is done in an eight-stage pipeline that allows all instructions to execute in a single instruction cycle regardless
of whether they are a single 24- or 48-bit instruction (SISD or SIMD), two superscalar parallel 24-bit instructions (MIMD) or
a 144-bit CLIW reference operand and operation block. This sequencing includes determining if it is a conditional execution
instruction that should execute partially, fully or not at all. In addition, the PCU considers that the execution may be extended
because of various wait conditions to maintain synchronization.
The program counter creates a steady stream of instruction addresses including repeated instructions and instruction loops
that are repeated for a predetermined count, all without any instruction cycle overhead and with nesting up to four levels
deep. The program counter handles exceptions processing in the core by maskable and non-maskable interrupts, hardware
exception traps and debug breakpoints. Full control and handling of the 240 interrupts vector space is done by an external-
to-the-core interrupt controller and the breakpoints are enabled by the systems emulation unit which can also do single-step
execution.
Rapid handling of exceptions in the program flow is facilitated by the program stack being located in the fast data memory,
by a full set of stack operations that includes conditional ones, and by the bank exchange instruction that in a single
instruction cycle changes a full data address register set and two accumulators.
3.6 The A and B Data Memory Interface (ABIF)
The data memory interface connects the four data memory buses to the other functional units of the core. It resolves data and
addresses from the DMA and I/O interfaces as well as data of the Execution Unit with addresses from the Address Unit. It
synchronizes the transfers with read delays and memory write-back operations so that the memories themselves can be
simple un-registered synchronous designs. The interface assures that read operations use current write-back data and it can
apply saturation arithmetic to write memory data that exceeds the memory size.
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 17 V1.0 2000-06-01
4CARMEL Programming Model And Instruction Set Summary
Section 3 described the interconnected hardware units of the CARMEL Core. This section summarizes the programmers
view of these units; the instruction set composed of operations and operands along with the programming model of the
memory and registers that are sources and destinations for the operands.
4.1 CARMEL Core Programming Model
The operand data programming model is defined by the operand data types, the data register and memory map, and the data
memory addressing modes.
Operand Data Types and Formats
The CARMEL Core has a basic data precision of 16 bits and all of the data types follow from this precision. A data Word is
thus 16 bits as shown in Figure 4-1. It may be signed (two’s complement) or unsigned, integer or fractional depending upon
the operation. A 32-bit Double Word (data) is two unrelated 16-bit Words composed of Most- and Least-Significant Word
fields used in Double Operations with two 16-bit operand sets. A 32-bit Long Word (data) is a double-precision word
composed of Most- and Least-Significant Word fields used in 32-bit operations with 32-bit operands. The six accumulators
have a 40 bit Accumulator Word with three fields of Guard and Most- and Least-Significant Word.
Operand Data Registers and Memory Maps
The data operand portion of the programming model is shown in Figure 4-2. The CARMEL Core is not a RISC load/store
architecture, but rather like a conventional DSP where the execution units operate out of the large data memories directly
rather than through an intermediate register file. Thus, most operations are with the data memory and the accumulators in the
top row of figure 4-2. Unused data address registers may be used for data and some operations use core’s internal system
registers. In addition, the CARMEL Core has data move operations with the I/O spaces shown in the bottom row of figure
4-2 including the 512-word portion for registers that are external to the core. Move operations also perform data load and
store operations with the program memory shown.
The 64k-word data memory space is divided into four distinct regions. Zone A and B memories may each have single or dual
ported regions. With the two buses for both A and B, single port memories may be divided into odd and even addresses as
shown to give the effect of dual ported memory transfers without the complexity. The stack can be placed arbitrarily within the
data memory space.
Data Operand Addressing Modes
Data operands are specified in the instruction set in any of four modes: as immediate data in the instruction, a direct reference
to the operand register including post modification, direct reference to the operand in data memory with an immediate
address or by an indirect reference to the operand in data memory by specifying the data address unit register and address
modification.
16 Bits16 Bits
32 Bits
16 Bits
40 Bits
Data Word
Double Data Word
Accumulator Data Word
Long Data Word
LSWMSW
LSWMSW
LSWMSW
Guard
Figure 4-1. CARMEL Core Data Types
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 18 V1.0 2000-06-01
Addresses for data memory may be 16 or 32 bits where the second 16-bit address in the 32-bit address is for the higher order
word in a 32-bit long or double data word. There are three types of address modification, linear, bit-reverse and modulo, each
with increment, decrement and indexed modifications.
4.2 Instruction Set Architecture
The CARMEL Instruction Set Architecture (ISA) is defined by the instruction formats, the program memory map and
addressing modes, the instruction types and the instructions themselves.
Instruction Formats
Instruction code efficiency is important because smaller code directly lowers program memory cost and it lowers power
consumed in operation. Small code size can come at the expense of performance. The CARMEL Core’s modular instruction
set gives the programmer the benefits of small instructions for simple operations or longer instructions when multiple parallel
operations are required.
The basic instruction Word is 24 bits as shown in Figure 4-3. The Full Word instructions of 48 bits are the same basic
operations with larger immediate operand fields and direct operand references. The third size is the Configurable Long
Instruction Word (CLIW™) of 144 bits. It is composed of a 48-bit reference instruction in the program memory space and a
96-bit block instruction in the CLIW memory space. The reference instruction identifies the appropriate CLIW block instruction
and references up to four operands by their data Memory Address (MA) pointers. The block instruction specifies up to six
parallel operations with CLIW sub-instructions that can use the two ALUs, the two MACs and perform two data moves. Note
that the field separations shown in the figure are only symbolic for the maximum number of operands and operations and do
not represent actual field sizes.
Instructions in the program memory space are fetched 48 bits at a time. Each fetch may be a single Full Word instruction, two
separate 24-bit Word instructions that execute sequentially or a single Parallel Instruction Word composed of two 24-bit Word
instructions that execute in parallel in the two execution units. The Parallel Instruction Word format is shown in Figure 4-3
where the higher order Word is designated Left and the lower order Word is Right. In parallel execution the Left instruction
operates on the Execution Unit 1 designated the Left EU and the Right instruction operates on the Execution Unit 2 which is
designated the Right EU. Instructions in the CLIW memory space are fetched singly 96 bits at a time.
16 Bits
40 Bits
Data Memory Space
0 K
64 K
Core Accumulators
I/O SpaceExternal-to-the-Core
Register Space
6
0
Core System
Register Space
512
00 M
16 M
Core Data Address
Registers
0
30
FPI Bus
Addresses
Control and
Status
Stack
Figure 4-2. Data Operand Programming Model
16 Bits
Program Memory Space
0 M
16 M
24 Bits
B Dual-Port
A Dual-Port
B Single-Port
B1 Even, B2 Odd
A Single-Port
A1 Even, A2 Odd
16 Bits
16 Bits
24 Bits
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 19 V1.0 2000-06-01
Instruction Memory Map
The two memory spaces for instructions in the CARMEL Core are shown in Figure 4-4. The program memory space is shown
48-bits wide because two 24-bit words are fetched together even though addressing is for individual words as shown in the
programming model of Figure 4-2. Note that the CLIW memory space is mapped into the program memory at an arbitrary
location for loading and unloading/verification with move operations. The interrupt vector block’s location is set arbitrarily and
the bootstrap program is at the very top of the program memory space as shown.
Instruction Addressing Modes
Program memory addresses are specified directly from the program counter register, usually with post modification. The
program counter is loaded directly in instructions with immediate jumps, interrupts, loop and repeat counters or indirectly from
the stack in data memory. CLIW memory space addresses are always specified in the CLIW reference instruction in the
program memory space.
48 Bits
24 Bits 24 Bits
96 Bit CLIW Operations Block
48 Bit CLIW Reference Operands
24 Bits
Full Instruction Word
Parallel Instruction Word
CLIW Reference Operands
Instruction with
CLIW Operations Block
Configurable Long Instruction Word (CLIW) Memory
Instruction Word
Right Word Left Word
Operand 1 Operand 2 Operand 3 Operand 4 Operation 1 Operation 2 Operation 4 Operation 3 Operation 5 Operation 6
Program Memory
Figure 4-3. Instruction Formats
MA 1 MA 2 MA 3 MA 4 MAC 1 ALU 1 ALU 2 MAC 2 Move 1 Move 2
16 M
0 K 96 Bits
CLIW Memory Space
Bootstrap
Program Memory Space
0 K
1 K
CLIW Memory
Interrupt Vectors
Figure 4-4. Program Memory Map
48 Bits
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 20 V1.0 2000-06-01
Instruction Summary
The CARMEL Core instruction set is powerful and flexible, and the C-like assembly syntax makes it easy to learn and
program. The 112 distinct instructions of the CARMEL Core are summarized in Table 4-1. They are arranged by operation
type: arithmetic, multiply, logical and single-bit operations, program control and system operations. Within an operation type
they are listed first in order of size with the smallest first where the CLIW sub-instruction is the smallest. The majority of
instructions are available in all three sizes: as a portion of a CLIW instruction, as a 24-bit instruction Word or as a 48-bit Full
Word with extended operand references.
Also listed in Table 4-1 for each instruction is the execution unit side where the operation can take place which is the same
as the side that the 24-bit Instruction Word must occupy in a 48-bit Parallel Instruction Word. The choices of side are:
ANY. Any side Left or Right including both Left and Right.
ONE. Only one such Instruction Word per Parallel Instruction Word as for example with program control. May be in
either side, just not in both.
LEFT. Only in the Left Execution Unit (EU1) and therefore the Left side of the Parallel Instruction Word. Examples are
operations that use the shifter or exponent units that exists only in EU1.
There are five inherently conditional instructions, such as a conditional branch, where execution depends upon a single
selected condition code. Most other instructions, including CLIW sub-instructions, are conditionally executable depending
upon a very flexible condition mechanism. Execution can depend on any of the eight combinations of three selected condition
codes from a choice of sixteen. The condition codes can change dynamically each cycle or statically under program control
by setting the conditional execution register. Execution may be suppressed completely or only partially where pointers are
modified but the final results and flags are not changed.
Double operations are the SIMD (Single Instruction Multiple Data) operations using the 16-bit operands in double data word
formats.
The Back Trace Registers store from left to right the sequential comparison results of minimum or maximum operations for
accelerating the Viterbi algorithm. The MINMAX address register holds the data address associated with minimum or
maximum operations. The updating of the Shift-Right bit permits automatic FFT scaling.
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 21 V1.0 2000-06-01
Table 4-1. Instruction Set Summary
Instruction Operations Side Instruction Size
CLIW Sub-
Instruction 24
Bits 48
Bits
Arithmetic (56)
Absolute Value, Decrement, Increment, Limit, Negate, Round, Division Step High, Division Step Low
Add, Add With Carry, Add Absolute Value, Add Double, Add High, Add Low,
Compare Signed, Compare Unsigned, Test Field For Ones, Test Field For Zeros
Max (2 & 3 Operands), Max & Update Back Trace Register (2 & 3 Operands),
Min (2 & 3 Operands), Min & Update Back Trace Register (2 & 3 Operands),
Subtract, Subtract With Borrow, Subtract Double, Subtract High, Subtract Low
Any 3 3 3
Max & Update MINMAX Address Register (2 Operands),
Min & Update MINMAX Address Register (2 Operands) One 3 3 3
Exponent, Extract Signed, Extract Unsigned, Insert, Rotate Left Thru Carry, Rotate Right Thru Carry,
Shift Arithmetic, Shift Logical Left 3 3 3
Add & Round, Add & Scale, Clear Accumulator, Subtract & Scale, Subtract & Round Any 3
Max Double, Min Double Left 3
Clear Back Trace Register Any 3
Clear Accumulators, Clear & Round Accumulators, Update Shift-Right Bit, Trace One 3
Max & Update MINMAX Address Register (3 Operands), Min & Update MINMAX (3 Operands) One 3 3
Limit 8 Bit -3
Multiply (11)
Multiply, Multiply & Round,
Multiply-Accumulate, Multiply-Accumulate & Round, Multiply-Accumulate Aligned,
Multiply-Subtract, Multiply-Subtract & Round,
Square, Square & Round, Square-Accumulate, Square-Accumulate & Round
Any 3 3 3
Logical (4)
And, Not, Or, Xor Any 3 3 3
Single-Bit (4)
Change Bit, Clear Bit, Set Bit, Test Bit One 3 3
Program Control (23)
Return, Return Conditional, Return From Interrupt, Return From Interrupt Conditional, Leave, Trap One 3
Branch Absolute (Indirect), Branch Conditional, Branch Relative, Break, Continue,
Call Absolute, Call Conditional (Relative), Call Relative, Link, Pop, Push, Repeat, Block Repeat One 3 3
Nop Any 3
Set Conditional Execution Flag, Clear Conditional Execution Flag -3
Load Conditional Execution Flags -3
System (14)
Move Any 3 3 3
Move Unsigned, Change Pointer Register Any 3 3
Bank Exchange of Pointer Registers, Load Pointer & Mode One 3 3
Change Groups of Pointer Registers, Interrupt Enable, Interrupt Disable One 3
Move EXT, Move From Data To Program, Move From Program To Data, Move IO, Swap, Reset -3
CARMEL Programming Model And Instruction Set Summary
CARMEL™ Technical Overview 22 V1.0 2000-06-01
Configurability And The Instruction Set
A major design objective for the CARMEL Core architecture was that it be configurable to optimize speed and code efficiency
for a given application. This objective has clearly been met in a hardware sense with the core’s extensibility in many directions
and the modularity inherent in an individually configurable core design. But configurability can be obtained in the instruction
set also as the CARMEL Core confirms.
The primary configuring mechanism is in the use of the Configurable Long Instruction Word (CLIW) operations block. Once
selected these 96-bit instructions composed of up to six individual parallel sub-instructions can be called by multiple 48-bit
CLIW reference instructions in a variety of contexts and with very different operands and execution conditions. Their
operation is much like a conditional subroutine with various input operands. Thus, once the algorithm design has been done,
the processing core is configured to repeatedly perform the same efficient operations required by the application. Figure 4-
5 illustrates how this task is achieved.
The data path itself has a high degree of configurability that tends to be constant throughout an application. Typical
configurable settings are rounding methods, saturation and limiting on overflow, scaling during multiplication and memory
operations, and variable scaling strategies for block floating-point and the FFT.
The CARMEL Core’s unusually extensive set of conditionally executable instructions, including the CLIW and SIMD ones,
provide another means of rapidly re-configuring the architecture in a sense. Consistency of execution time is often critical in
real-time digital signal processing and using conditional execution is an established technique to ensure it. It is particularly
flexible in this core because the conditions can be complex, dynamically determined or statically changed, and they can
suppress only the data execution or the full operation.
Shifter
Figure 4-5. Using The CLIW Power on All CARMEL Core Execution Units and for Two Data Memory Transfers
MAC 1
Exponent
ALU 1 MAC 2ALU 2
CLIW Reference Operands
Instruction with
Operations Block
Mem Addr 1
Mem Addr 2
Mem Addr 3
Mem Addr 4
Execution UnitsData Memories
Configurable Long Instruction Word (CLIW) Memory Program Memory
Memory Addresses EU1 Instructions EU2 Instructions Transfers
96 Bit Operations Block48 Bit Reference Operands
Operand 1 Operand 2 Operand 3 Operand 4 Operation 1 Operation 2 Operation 4 Operation 3 Operation 5 Operation 6
MA 1 MA 2 MA 3 MA 4 MAC 1 ALU 1 ALU 2 MAC 2 Move 1 Move 2
The CARMEL System Architecture
CARMEL™ Technical Overview 23 V1.0 2000-06-01
5The CARMEL System Architecture
A system-on-a-chip (SoC) design solution needs a complete system architecture, not just an isolated processing core
architecture. The CARMEL Product Line provides this. Beyond meeting the signal processing needs of the system, the core
must provide for other system functions that may or may not be on the same chip. General system partitioning issues that are
important in assessing an architecture are:
System control: with or without a host that may be an internal or external host
Testing and emulation: both in development and production phases
Support for external buses and standard interfaces
Internal/external component choices: for additional processing power, large memories and large peripherals
System functions: clock generation and synchronization, power management
5.1 Representative External Systems
For the purposes of understanding the CARMEL system architecture consider using an external system like the one shown
in Figure 5-1. It is a composite of most of the possible external components to illustrate how they can be accommodated in
a CARMEL-based SoC. It is not necessarily a typical system.
Given large complex system control tasks, established bodies of proven control software, and large control peripherals, an
external microcontroller host is often dictated in such a system. Thus the digital signal processing system-on-a-chip must
connect effectively to common microcontrollers with standard interfaces. The host is also often the system self-test controller
using JTAG serial scan paths with all system elements.
Because of sheer size, power or mixed-signal requirements, some peripheral circuits may not be able to be included on the
SoC. These peripherals must be easily connected using standard interfaces and buses. They are represented in figure 5-1
by peripherals 1-4 on a serial and a parallel bus. For systems with large memories the economy of large off-chip DRAMs may
be required. These memories need to be added cost-effectively with their special timing characteristics. Additional
processing power may be needed, can the co-processor or accelerator be internal or must it be external?
Many systems already have a system clock whose frequency is chosen. The signal processing SoC often must operate
synchronously with system clock. Normal clock rates, sleep clock rates, gated clocks and clock distribution also figure heavily
in system power management. These functions must be controllable in SoC applications.
Historically in small systems and even some large ones, just a few general purpose programmed I/O signals can handle
critical timing and configuration control. They require no special interface circuits or interrupts, and are programmed in as a
general purpose I/O.
Parallel
Port
Serial
Port
External
Memory
Interface
GPP
CARMEL-Based Chip
Host
Interface
System
Interface
JTAG
Interface
Host
System Control Signals
System Reset, Clock and Configuration
Serial System Test
Host
Memory
Periph 1 Periph 2
Host I/O
Periph 3
DRAM
Host
Peripherals
Periph 4
Serial Peripheral Bus
Parallel Peripheral Bus
Figure 5-1. A Composite External-Host CARMEL-Based System
Signal Processing
System-On-A-Chip
Host Bus
The CARMEL System Architecture
CARMEL™ Technical Overview 24 V1.0 2000-06-01
5.2 A Representative CARMEL-Based System-on-a-Chip
Figure 5-2 shows a CARMEL-based SoC configured to fit into the external system of Figure 5-1. Looking at each module in
turn illustrates how the CARMEL architecture and standard modules can meet the representative system requirements.
CARMEL Core
AU
EU
PCU
AB
I/F
CLIW Memory Bus
RAM
Program Memory Bus
RAM
A RAM B RAM
Data Memories
Interrupts
DMA
IO Bus
GPP
Instruction Memories
FBCU
BIU External
Memory
Interface
55
FPI Bus and
Controller
GPP
8
ICU
FPIU
DMA
Intrpt
DMA
Host I/F Host
Interface
24
JTAG I/F 5
EMU
System
Interface
JTAG
Interface
System
Memories
SRAM,
DRAM,
ROM,
Flash
ROM
ROM
Peripheral
Controller 2 Serial
Port
Figure 5-2. A Composite CARMEL-Based System-on-a-Chip Illustrating the System Architecture
Sys
System Clock
& Configuration
Timers
To Peripherals
Interfaces
Peripheral
Controller 1 Parallel
Port
CARMEL Memories
System Interconnect
System Peripherals
and Memories
Memories
CARMEL-Based System-on-a-Chip
Core Emulation Unit
Core-to-System I/F
Co-
Processor
The CARMEL System Architecture
CARMEL™ Technical Overview 25 V1.0 2000-06-01
5.3 The CARMEL Core Module
The core is shown fully connected in Figure 5-2 to use the extended architecture on all six interfaces of Table 3-1.
5.4 CARMEL Memory Modules
For all except the largest DRAMs, cost, speed and power are improved with on-chip memories. Thus all of the direct CARMEL
memory interfaces are designed for on-chip single-cycle synchronous memories, except program memory during booting-up.
Program Memory
Program memory can be all ROM for the greatest size, power and reliability benefits. Most often the ability to download
program RAM to change functionality or provide an upgrade path requires a mix of the memory types. In some cases only
the simplest bootstrap is in ROM. The CARMEL system architecture permits all of these combinations because it allows
move operations between program memory and other system memories. Slower, byte-wide bootstrap memories can be
elsewhere in the system including off-chip as determined by the core configuration settings.
CLIW Memory
Similarly, CLIW memory can be RAM and/or ROM since it appears in the program memory space and may also be
downloaded. This memory is optional if the benefits of the parallel CLIW functionality are not needed.
Data Memories
The choice of memories and non-memory elements in this address space is very large. Both RAM and ROM may be used
as well as memory-mapped co-processors. The highest performance comes with using all four buses with separate A and B
zone memories. Within each zone there can be the smaller single-port designs and the higher performance dual-port ones
including designs with separate even and odd address ports. Co-processors located in this space are typically for arithmetic
acceleration on local data with deterministic processing times.
5.5 System Interconnect Modules
The designer of a system with the CARMEL Core can integrate the core in any manner they choose provided they observe
the protocols at the six signal interfaces in Table 3-1. However, the CARMEL product modules are designed to use the
Flexible Peripheral Interconnect (FPI) Bus to extend the architecture for other than the direct memories described above.
Table 5-1 summarizes the characteristics of the bus as permitted by the FPI Bus specification, supported by the CARMEL
Core, and implemented in the CDEV development chip. The FPI specification only specifies the FPI Bus with master and
slave interfaces for various types of data transfers. Additional so-called sideband bus signals for DMA and Interrupts are
permitted but are allowed to be implementation dependent. Certain System clocks and resets are defined as well.
For the CARMEL product modules, the FPI Bus, the DMA signals, the Interrupt signals and distributed Systems signals are
all defined and are shown schematically in Figure 5-2. These signals are largely determined by the CARMEL Core-to-System
Interface and they can be changed as desired. For example, although the core has 16-bit data paths, it could easily be
interfaced to a conforming 32-bit FPI Data Bus by redesigning the Core-to-System Interface sub-units.
Core-to-System Interface
As Figure 5-2 shows, three sub-units in the core-to-system interface extend the core’s own DMA, IO and Interrupt interfaces
onto the augmented FPI bus. The DMA unit has data buffering and dual address generation for eight independent channels
of DMA with three priority levels. The modular Interrupt Control Unit (ICU) can be cascaded to add in groups of sixteen the
masking, priority and vector multiplexing control for the core’s interrupt interface. The FPI Unit (FPIU) buffers and sequences
the core’s own IO bus for normal data transfers. When the core must act as a master on the FPI Bus as for DMA and most
core initiated I/O, it also executes the required bus protocol.
FPI Bus, Clock and System Configuration Units
The FPI Bus requires a controller and default master for its operation. This is done by the FPI Bus Control Unit (FBCU)
module in the CARMEL product line. Other bus masters may be the default master that arbitrates the bus use. Common
clocks and resets are defined for a CARMEL-based system-on-a-chip. To optimize power and performance trade-offs, the
The CARMEL System Architecture
CARMEL™ Technical Overview 26 V1.0 2000-06-01
FPI bus clock, the core clock and the oscillator or external reference clock frequencies can all be determined independently.
Power management clock control signals are distributed in the system signals. Reset signals are defined separately for bus
interfaces and peripherals and the system reset for processors like the core. Configuration signals distribute to the system a
uniform set of information about the FPI bus and system memory configurations.
Emulation Unit
The core emulation unit is interfaced to the FPI bus rather than directly to an external interface. This permits other processor
emulation units to share the single augmented JTAG external interface and for all emulation units to work in concert to test
and debug the complete system-on-the chip.
The CARMEL Core Emulation Unit’s intimate connection allows trapping and breakpoint support for off-line tracing after a
breakpoint or error trap condition using then-resident emulation software.
GPIO
The four programed inputs and four programmed outputs of the core’s General Purpose Port (GPP) can be distributed on the
chip for direct control without using the FPI Bus, but can also be an external interface as shown for the industry standard
General Purpose Input Output (GPIO).
Table 5-1. FPI Bus Design Choices
Bus Parameter Permitted By The
FPI Specification Supported By The
CARMEL Core Implemented In The CDEV
Development Chip
FPI Address Bus size 16 - 32 bits 24 bits 24 bits
Data Bus size 16, 32, 64 bits 16 bits 16 bits
Data transfer size 8, 16, 32, 64 bits 8, 16 bits 8, 16 bits
Data transfer modes Single, split, block, DMA Single, DMA Single, DMA
Masters internal 16 -6: DMA(3), FPIU(1), JTAG(2)
Masters external 6
Slaves internal --6: DMA, EMU, JTAG, BIU, HI,
SCU
Slaves external 5
DMA DMA Channels external --4: 5-2
DMA Channels internal 4: 7,6 = Host; 1,0 = Core
Interrupt Interrupts internal -240 Master(12), Slave(12)
Interrupts external Master(4), Slave(4)
System Clock Rate --125 MHz
Resets System and/or Interfaces -System and Interfaces
The CARMEL System Architecture
CARMEL™ Technical Overview 27 V1.0 2000-06-01
5.6 System Peripheral And Memory Modules
CARMEL system product modules that go on the FPI Bus are grouped into those that directly provide an interface for off-chip
connection to a specific device, those that provide some autonomous controller function with or without creating an external
generic bus and those that are on-chip system memories.
Interfaces
For a variety of system partitioning reasons, like having replaceable program ROMs or large DRAMs, it may be desirable to
have off-chip memories. They are easily added in the large 24-bit FPI bus address space supported by the core. The system-
wide configuration signals can facilitate the address mapping, data alignment and timing control. DMA priorities assure that
transfers take place at appropriate times for large data movements.
External host interfaces can be added to the FPI Bus with a data format and priority for real-time control that suits the
application. Transfers can be DMA, programmed or interrupt driven. Industry standard JTAG test and on-chip debug support
connections can be made with an Infineon specified JTAG interface that help implement a comprehensive test/debug
solution that is even more important in the SoC context.
Controllers
There are many industry standard interfaces that involve operations that are autonomous or independent of the direct timing
of transfers on the FPI Bus. These interfaces divide into generic ones like the synchronous and asynchronous serial and
parallel interfaces such as the UART. Other interfaces are application specific such as audio codecs or LCD displays.
An asynchronous/synchronous serial interface module covers 8- or 9-bit half or full duplex transfers with programmable rates
from less than one baud to greater than a megabaud with a 25-MHz input clock.
A parallel port interface module is organized as three 8-bit input/output ports for control signals and/or data transfers.
General purpose timers are a common autonomous controller in real-time DSP systems. The CARMEL system product timer
module has three 32-bit counters which are very flexible in configuration and operating mode. They can count events or time
intervals with a variety of clock, signal transition, reload and service request types.
FPI-Bus System Memories
The DMA and FPI Bus transfer modes easily accommodate the various timing constraints of larger slower internal system
memories. These memories may be more SRAM and ROM or embedded DRAMs and Flash with their own controllers.
Programming Examples
CARMEL™ Technical Overview 28 V1.0 2000-06-01
6Programming Examples
The following two DSP functions have been programmed on the CARMEL Core. These annotated examples show how
straightforward it is to program the core and use some of the features in the instruction set. They also provide performance
numbers for comparison with other processor benchmarks.
6.1 FIR Filters
The Finite-Impulse-Response (FIR) filter is the most common DSP function in many applications. With simple, but highly
repetitive arithmetic and simple data structures it is the primary benchmark to measure computation-limited speed.
A FIR filter is a continuing computation over time of the form:
where y is the filter output and x is the uniformly sampled signal input data points with the index t over time. N is the number
of points in the filter impulse response, filter taps or coefficients in the weighting function w.
Common algorithm variations are for real or complex input data, real or complex filter coefficients, N being even or odd and
the coefficients being symmetrical (w [N/2 + n] = w [N/2 - n]). Common arithmetic variations involve the signal and coefficient
precisions, amount of accumulation precision and scaling and/or saturation strategies. Additional considerations when
selecting an algorithm can be speed (i.e. maximum real-time bandwidth), minimum total program size or minimum data
memory size including accumulator usage.
Two variations are provided here that illustrate a single output point on real data with a non-symmetrical impulse response
and one generating two output points with a symmetrical impulse response. The symmetrical case in the following section
shows the power of the CLIW instructions while this example demonstrates the power of Carmel parallelism.
yt() xtn() wn()
n0=
nN1=
=
Programming Examples
CARMEL™ Technical Overview 29 V1.0 2000-06-01
Single-Sample Real Non-Symmetrical FIR Filter
Using the data memory and register utilization shown in Figure 6-1, the following program computes a single output sample
for an N tap filter:
Where || specifies parallel operations, + is addition, * is multiplication, += is accumulation and *r0++ is data at the indirect
address in address register r0 with post incrementation by one.
Note that two accumulations are calculated in the single parallel word that is repeated in the kernel and that by using the same
pointers in both instructions they are incremented as though the left were executed and then the right. In Figure 6-1 these
address pairs are designated as r6 and r6´ and r2 and r2´. If N is not even, then an additional zero valued coefficient can be
added to make it so without a loss of speed.
The computation time is N/2 + 2 instruction cycles with 4 data memory accesses per cycle. The program size is 15 bytes with
a data memory of 4N + 2 bytes required.
FIR1: //Single-Sample Real Non-Symmetrical FIR Filter
{ //Prolog
clr(a0,a1)||rep(N/2)single; //Clear accumulators a0 and a1, and repeat the next
//single instruction N/2 times
{ //Kernel
a0 += *r6++ * *r2++||a1 += *r6++ * *r2++;//Accumulate in a0 and a1 the products of the even and
//odd data and coefficients respectively as pointed to
}//by r6 and r2 which are sequentially post-incremented
//by one
//Epilog
*(r0++)=a0+a1; //Add a0 to a1 and store in location pointed to by r0
} //End
Odd Accumulation
r6
Figure 6-1. Register and Data Memory Usage for a Real Non-Symmetrical FIR Filter
w Pointer
x Pointer
y Pointer
r2
r0
Data Memory B
Even Accumulation
a1
a0
x(t-N+1)
x(t-N+2)
x(t-1)
x(t)
w(N-1)
w(N-2)
w(1)
w(0)
Data Memory AAccumulators Address Unit Registers
r6 r2
y(t)
r0
r2´r6´
Programming Examples
CARMEL™ Technical Overview 30 V1.0 2000-06-01
Block Real Symmetrical FIR Filter
Using the data memory and register utilization shown in Figure 6-2, this program computes two output samples for t equal to
t and t+1 for an N tap filter with a symmetrical impulse response:
Where CLIW instructions are of the form:
cliw name (ma1, ma2, ma3, ma4) {mac1 || alu1 || mac2 || alu2 || move1 || move2};
FIR2: //Block Real Symmetrical FIR
Filter
{ //Prolog
cliw firsym1 (r2++,r2++,r4--,r4--)
{
nop||a0=*ma2+*ma3||nop||a1=*ma1+*ma4|| //Add two initial pairs of x(t) values and load delay
ff1=*ma2||ff2=*ma4; //registers: becomes x(t) and becomes x(t-N+2)
}
clr(a2,a3)||rep(N/2-1)single; //Clear accumulators a2 and a3, and repeat next
//single instruction N/2 - 1 times
{ //Kernel
cliw firsym2 (r2++,r4--,r6++)
{
a2+=a0h**ma3||a0=*ma1+ff2|| //Add two pairs of x(t) values, perform two multiply-
a3+=a1h**ma3||a1=*ma2+ff1|| //accumulates and load delay registers and
ff1=*ma1 ||ff2=*ma2;
}
}//Epilog
cliw firsym3 (r0++,r0++,r6++)
{
*ma1+=a0h**ma3+a2||nop||*ma2+=a1h**ma3+a3; //Add final products and store two output points
}//y(t) and y(t+1)
} //End
y(t+1) Accumulation
r6
Figure 6-2. Register and Data Memory Usage for a Block Symmetrical FIR Filter
w Pointer
x Fwd. Pointer
x Rev. Pointer
r4
r2
y(t) Accumulation
a3
a2
x(t-N+1)
x(t-N+2)
x(t-1)
x(t)
w(N/2-1)
w(N/2-2)
w(1)
w(0)
r6
r2
r4
Data Memory BData Memory AAccumulators Address Unit Registers
y(t+1) Data Sums
y(t) Data Sums
a1
a0
y Pointer
r0
y(t)
Execution Unit Registers
ff2 Forward Data
Reverse Data
ff1 r0
y(t+1)
r4´
r2´
r0´
x(t+1)
Programming Examples
CARMEL™ Technical Overview 31 V1.0 2000-06-01
The successive readings of x(t) from data memory B are shown in Figure 6-3 from left to right. The brackets show the data
pairs that are added from the forward and reverse progression of time. The arrows show the moves of some of the same data
to and hold registers for re-use in the next pass without another memory read. Note that forward progression of time
with increasing t is for numerically decreasing data addresses.
Three CLIW instructions, firsym1-3 are used. The firsym2 instruction takes advantage of the symmetry by using a CLIW
that can do two data additions and two multiply-accumulates along with two register loads. Both firsym1-2 utilize the delay
registers (ff1, ff2) available for parallel use with CLIW.
The computation time is now N/4 + 1.5 instruction cycles per output point with a total of three data memory accesses per cycle
except for four in the prolog. The program size is now 60 bytes with a data memory of 3N + 2 bytes required.
x(t-N+1)
x(t-N+2)
x(t-1)
x(t)
x(t+1)
x(t-2)
x(t-N+3)
x(t-N+1)
x(t-N+2)
x(t-1)
x(t)
x(t+1)
x(t-2)
x(t-N+3)
x(t-N+1)
x(t-N+2)
x(t-1)
x(t)
x(t+1)
x(t-2)
x(t-N+3)
ff1
ff2
ff1
ff2
x(t-N+4)
ff1
ff2
x(t-N+4)x(t-N+4)
Prolog
Figure 6-3. Successive Read Cycles for Data Memory B
1st Pass Kernel 2nd Pass Kernel
1st
y(t+1)
Sum
1st
y(t)
Sum
2nd
y(t+1)
Sum
2nd
y(t)
Sum
3rd
y(t+1)
Sum
3rd
y(t)
Sum
Programming Examples
CARMEL™ Technical Overview 32 V1.0 2000-06-01
6.2 Vector Quantization
Vector quantization is a common form of communications speech encoding that is computationally intensive. The best fit is
found between an incoming speech sample vector and a codebook of reference coefficient vectors. This involves finding the
minimum distance between the sample vector and all codebook entries.
The squared difference between a sample vector xi of length N and a codebook coefficient vector cj of length N is of the form:
Common variations combine it with the minimizing process of comparing other codebook entries or to use a weighting
function on the distances to compensate for non-uniform energies in the codebook entries.
The minimum distance for a sample vector i with a codebook of M entries is of the form:
The index of the codebook entry j with the minimum distance is the desired result, not the actual minimum distance for the i
sample vector.
distanceij xin() cjn()[]
2
n0=
nN1=
=
minimum_distance iMIN disceij
tan[]=j0=
jM1=
Programming Examples
CARMEL™ Technical Overview 33 V1.0 2000-06-01
Using the data memory and register utilization shown in Figure 6-4, the following program computes the minimum squared
distance between the xi sample vector of length N with M code vectors of length N. This minimum squared distance remains
in accumulator a4 and the address of the last sample for the corresponding code vector is in the MINMAX register.
A single CLIW instruction, VQ is used. Instruction VQ utilizes both ALU and both MAC execution units at once in the inner
loop. There are only three data memory data accesses. Note the speed derived from the three operand minimum and update
instruction minm that finds the minimum but also the corresponding address which is the important result.
The computation time is (M/2)(N+4) + 1 instruction cycles. The program size is 51 bytes with a data memory of 2(M+1)N bytes
required.
VQ1: //Vector Quantization
{ //Outer Loop over M code vectors
rep(M/2)block; //Repeat outer block loop M/2 times
{
a0l = *r0++ - *r4++ || //Find first distance with first half code vectors
a1l = *(r0+rn0) - *(r4+rn4); //Find first distance with second half code vectors
clr(a2,a3)||rep(N-1)single; //Clear accumulators a2 and a3, and repeat next
//single instruction N-1 times
{ //Kernel inner loop over N-1 points
cliw vq (r0++,r4++,r4+rn4)
{
a0l = *ma1-*ma2||a2 += sqr(a0l)|| //Accumulate distance2 with first half code vectors
a1l = *ma1-*ma3||a3 += sqr(a1l); //Accumulate distance2 with second half code vectors
}
}
a2 += sqr(a0l) || a3 += sqr(a1l); //Accumulate last distance2 with first half code
//vectors and with second half code vectors
minm(a4,a2,r4); //The current minimum squared distance is placed in a4
minm(a4,a3,r4+rn4); //and the end address of the corresponding code vector
//is stored in the MINMAX register
}
} //End
First Half Distances
r4
Figure 6-4. Register and Data Memory Usage for Vector Quantization
c Pointer
MN/2
x Pointer
rn4
r0 Second Half Distances
a0
a1 xi(0)
xi(1)
xi(N-2)
xi(N-1)
c0(0 – N-1)
c1(0 – N-1)
cM/2-2(0 – N-1)
cM/2-1(0 – N-1)
r4
r0
Data Memory BData Memory AAccumulators Address Unit Registers
First Half Distance2 Accumulation
Second Half Distance2 Accumulation
a2
a3
cM/2(0 – N-1)
cM/2+1(0 – N-1)
cM-2(0 – N-1)
cM-1(0 – N-1)
r4+rn4
Current Minimum Squared Distance
a4
Zero
rn0