SOC - 0 © by Tien-Fu
Chen@CCUOverview of SOC
Architecture design
Tien-Fu Chen
National Chung Cheng Univ.
Computer Architectures
SOC - 1 © by Tien-Fu
Chen@CCUSOC design Issues
SOC architecture
Reconfigurable
System-level
Programmable processors
Low-level reconfiguration
On-chip bus
Embedded Software Issues
SOC - 2 © by Tien-Fu
Chen@CCUEmbedded Systems vs. General Purpose
Computing - 1
Embedded System
Runs a few applications often
known at design time
Not end-user programmable
Operates in fixed run-time
constraints, additional performance
may not be useful/valuable
General purpose computing
Intended to run a fully general set
of applications
End-user programmable
Faster is always better
SOC - 3 © by Tien-Fu
Chen@CCUEmbedded Systems vs. General Purpose
Computing - 2
Embedded System
Differentiating features:
power
cost
speed (must be predictable)
General purpose computing
Differentiating features
speed (need not be fully
predictable)
speed
did we mention speed?
cost (largest component power) CPU
mem
input
output analog
analog
embedded
computer
Logic
SOC - 4 © by Tien-Fu
Chen@CCUEmbedded System: Examples
Embedded
System
SOC - 5 © by Tien-Fu
Chen@CCUDesign Complexity Increase
35%
12%
25%
10%
>5 Clock Domains
>9 Clock Domains
61%
22%
44%
15%
Clock Speed >133mhz
>400mhz
295K
266K
54K
75K
# of lines of DSP SW
# of lines of uP SW
37% 20% Units Shipped >5M
1,000K
31%
616K
18%
Average Gate Count
Designs with Gate Count >
1M
Next
Design
Current
Design
SoC Characteristics
Source: Collett International Research- December 2000 Research on
360 IC/ASIC Design Teams in North America
SOC - 6 © by Tien-Fu
Chen@CCUImplementation
Timed,
Clocked,
RTL Level
Refinement
Design Export
Specification
Untimed,
Unclocked,
C/C++ Level
Embedded System on Chip (SoC) Design
Testbench
Satellite
Macro-Cell Micro-Cell
Zone 2: Urban
Zone 1: In-Building
Pico-Cell
Zone 4: Global
Zone 3: Suburban
System
Environment
Implementation
Characterization
Firmware
CORE
Software
SOC
µ P/C
Analog
Embedded
Software
Memory
Embedded
Systems Design
Requirements
Specification
SOC - 7 © by Tien-Fu
Chen@CCUArchitectures
Supplements Models by specifying how the system will
actually be implemented
Goal of each architecture is to describe
Number of components
Type of each component
Type of each connection among above components
General classification
Application-specific architectures: DSP
General-purpose architectures: CISC, RISC
Parallel processors: VLIW, SIMD, MIMD
SOC - 8 © by Tien-Fu
Chen@CCUSystem Architecture Design
System Architecture & Exploration
What
Hardware/Software partitioning; processor, and memory architecture
choices; system timing budget, power management strategy, system
verification strategy…
Partitioning into HW block hierarchy, cycle time budgeting, block
interfaces, block verification, clock architecture and test strategy
Fixed point architecture exploration and design
How - Quickly assemble architecture(s) for exploration to
measure system timing/performance. Need to accurately
(enough) model the bottlenecks
SOC - 9 © by Tien-Fu
Chen@CCUSystem Integration & Verification
System Design Environment for HW/SW Refinement,
Verification and Integration
What
– Enables hierarchical (manual or automatic) refinement of individual blocks of
design in context of system. Maintain system and hierarchical test benches
– Verification of refined hardware/software with entire system design
– Define next level of clock architecture (derived) and test strategy
How - Build a system verification hierarchy that allows
integration of HW blocks, system software (HAL), embedded
application SW and eventually verifying the entire design at
cycle accurate (or RTL) level
SOC - 10 © by Tien-Fu
Chen@CCUCoDesign and Co-Synthesis
Specification
Detailed Representation
of Implementation
Synthesis
HW: HDL
(Behavioral, DataFlow,
Structural), Schematic
RTL, Gate level,
Transistors, Layout
SW: Algorithm,
Textual/Graphical
representation
Executable or
Compilable code:
The program(s),
OS routines
Co-Synthesis
Partition
SOC - 11 © by Tien-Fu
Chen@CCUFabrication Test
Traditional design
System
design
ASIC design
SW design
PCB test
SW test
Time
Tasks
Traditional System Design Process
SOC - 12 © by Tien-Fu
Chen@CCUSystem-level Co-design
Shared Design
Co-Design Process
SW design
ASIC design Fabrication Test
PCB test
SW test
Time
Tasks
System
design
System-Level Partitioning
SOC - 13 © by Tien-Fu
Chen@CCUConfigurabilty and Embedded Systems
Advantages of configuration:
Pay (in power, design time, area) only for what you
use
Gain additional performance by adding features
tailored to your application:
Particularly for embedded systems:
Principally in embedded controller microprocessor
applications
Some us in DSP
SOC - 14 © by Tien-Fu
Chen@CCUWhat to Configure?
What parts of the microcontroller/microprocessor system to
configure?
Easy answers:
Memory and Cache Sizes - get precisely the sizes your applications
needs
Register file sizes
Interrupt handling and addresses
Harder answers:
Peripherals
Instructions
But first we need more context
SOC - 15 © by Tien-Fu
Chen@CCUTrickle Down Theory of Embedded Architectures
Mainframe/supercomputers
High-end servers/workstations
High-end personal computers
Personal computers
Lap tops/palm tops
Gadgets
Watches
...
Features tend to trickle down:
• #bits: 4->8->16->32->64
• ISA’s
• Floating point support
• Dynamic scheduling
• Caches
• I/O controllers/processors
• LIW/VLIW
• Superscalar
SOC - 16 © by Tien-Fu
Chen@CCUConfigurability in ARM Processor
ARM allows for configurability via AMBA bus
Offers ``prime cell’’ peripherals which hook into AMBA Peripheral
Bus (APB)
UART
Real Time Clock
Audio Codec Interface
Keyboard and mouse interface
General purpose I/O
Smart card interface
Generic IR interface
SOC - 17 © by Tien-Fu
Chen@CCUARM7 core
SOC - 18 © by Tien-Fu
Chen@CCUARM’s Amba open standard
Advanced System Bus, (ASB) - high performance, CPU, DMA, external
Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O,
UART’s
External interface
http://www.arm.com/Documentation/Overviews/AMBA_Intro/#introSOC - 19 © by Tien-Fu
Chen@CCUEx: ARM Infrared (IR)
Interface
SOC - 20 © by Tien-Fu
Chen@CCUEx : Audio Codec
SOC - 21 © by Tien-Fu
Chen@CCUAnother Kind of Configurability
RTL
Synthesis
HDL
netlist
logic
optimization
netlist
Library
physical
design
layout
Synthesis of a processor core from
an RTL description allows for:
• full range of other types of
configurability
• additional degrees of freedom in
quality of implementation
Examples:
• ARM7
• Motorola Coldfire
• Tensilica Xtensa
SOC - 22 © by Tien-Fu
Chen@CCUIssues in low-level Configurable Design
Choice and Granularity of Computational
Elements
Choice and Granularity of Interconnect
Network
(Re)configuration Time and Rate
Fabrication time --> Fixed function devices
Beginning of product use --> Actel/Quicklogic
FPGAs
Beginning of usage epoch --> (Re)configurable
FPGAs
Every cycle --> traditional Instruction Set
Processors
SOC - 23 © by Tien-Fu
Chen@CCUThe Choice of the Computational Elements
Reconfigurable Reconfigurable
Logic Logic
Reconfigurable Reconfigurable
Datapaths Datapaths
adder
buffer
reg0
reg1
mux
CLB CLB
CLB CLB
Data
Memory
Inst ruction
Decoder
&
Controller
Data
Memory
Program
Memory
Datapath
MAC
In
AddrGen
Memory
AddrGen
Memory
Reconfigurable Reconfigurable
Arithmetic Arithmetic
Reconfigurable Reconfigurable
Control Control
Bit-Level Operations
e.g. encoding
Dedicated data paths
e.g. Filters, AGU
Arithmetic kernels
e.g. Convolution
RTOS
Process management
SOC - 24 © by Tien-Fu
Chen@CCUMulti-granularity Reconfigurable Architecture:
The Berkeley Pleiades Architecture
Communication Network
Control
Processor
Arithmetic
Processor
Arithmetic
Processor
Arithmetic
Processor
Configurable
Datapath
Configurable
Logic
Configuration Bus
Network Interface
Dedicated
Arithmetic
Configuration
Satellite Processor Satellite Processor
• Computational kernels are “spawned” to satellite processors
• Control processor supports RTOS and reconfiguration
• Order(s) of magnitude energy-reduction over traditional programmable architectures
SOC - 25 © by Tien-Fu
Chen@CCUMatching Computation and Architecture
AddressGen AddressGen
Memory Memory
MAC MAC
Control
Processor
L C G
Convolution
Two models of computation:
communicating processes + data-flow
Two architectural models:
sequential control+ data-driven
SOC - 26 © by Tien-Fu
Chen@CCUExecution Model of a Data-Flow Kernel
for(i=1;i<=L;i++)
for(k=i;k<=L;k++)
phi[i][k]= phi[i-1][k-1]
+in[NP-i]*in[NP-k]
-in[NA-1-i]*in[NA-1-k];
end start
Embedded processor
AddrGen
MEM: in
ALU
ALU
AddrGen
MEM: phi
MPY MPY
• Distributed control and memory
Code seg
Code seg
SOC - 27 © by Tien-Fu
Chen@CCUSoftware Methodology Flow
Algorithms
Kernel Detection
Estimation/Exploration
Partitioning
Software Compilation
Reconfig. Hardware Mapping
Interface Code Generation
Power & Timing Estimation
of Various Kernel Implementations
PDA Models
Premapped
Kernels
Accelerator
µproc&
Behavioral
C++ Module
Libraries
C++
SUIF+ C-IF
SOC - 28 © by Tien-Fu
Chen@CCUThe System-on-a-Chip Nightmare
Bridge
DMA CPU DSP
Mem
Ctrl.
MPEG
C I O O
System Bus
Peripheral
Bus
Control Wires
Custom Interfaces
The “Board-on-a-Chip”
Approach
SOC - 29 © by Tien-Fu
Chen@CCUSonics SOC Integration Architecture
SiliconBackplane
Agent™
Open Core
Protocol™
SiliconBackplane™
(patented)
MultiChip
Backplane™ {
DSP MPEG CPU DMA
C MEM I O
SOC - 30 © by Tien-Fu
Chen@CCUMaster vs. Slave
IP Core IP Core IP Core
On-Chip Bus
Slave
Master Slave Slave
Slave
Master
Master Master Initiator Target
Open Core
Protocol Request
Response
SOC - 31 © by Tien-Fu
Chen@CCUThe Backplane: Why Not Use a Computer Bus?
IP
Core
IP
Core
IP
Core
IP
Core
Computer
Bus
Transmit FIFO Receive FIFO
Time
Data
Arbiter Address
•Expensive to decouple
• Not designed for real-time
SOC - 32 © by Tien-Fu
Chen@CCUCommunication Buses Decouple
and Guarantee Real Time
IP
Core
IP
Core
IP
Core
IP
Core
Communications
Bus
Transmit FIFO Receive FIFO
Time
Data
TDMA TDMA
• Connections are expensive
• Poor read latency
SOC - 33 © by Tien-Fu
Chen@CCUOn-Chip Bus for SOC
Example on- chip bus interconnects
ARM’s AMBA bus
IBM’s Core Connect
Virtual Socket Interface Alliance group
Open Connect Protocol group
Example processor cores
ARM
MIPS
PowerPC
SOC - 34 © by Tien-Fu
Chen@CCUReconfigurable
DataPath
Reconfigurable
State Machines
Embedded uP
+ DSPs
FPGA
Dedicated
DSP
Design Example:
The Radio-on-a-Chip
DSP and control
intensive
Mixed-mode
Combines
programmable, flexible,
and application-specific
modules
Cost and energy are
the key metrics
SOC - 35 © by Tien-Fu
Chen@CCUSystem Level Design Science
Design Methodology:
Top Down Aspect:
Orthogonalization of Concerns:
– Separate Implementation from Conceptual Aspects Separate Implementation from Conceptual Aspects
– Separate computation from communication Separate computation from communication
Formalization: precise unambiguous semantics
Abstraction: capture the desired system details (do not overspecify)
Decomposition: partitioning the system behavior into simpler behaviors
Successive Refinements: refine the abstraction level down to the
implementation by filling in details and passing constraints
Bottom Up Aspect:
IP Re-use (even at the algorithmic and functional level)
Components of architecture from pre-existing library
SOC - 36 © by Tien-Fu
Chen@CCUSeparate Behavior from Micro-architecture
Front Front
End End 1
Transport Transport
Decode Decode 2
Rate Rate
Buffer Buffer
12 12
Rate Rate
Buffer Buffer
9
Rate Rate
Buffer Buffer
5
Sensor Sensor
Synch Synch
Control Control
4
Video Video
Decode Decode 6
Audio Audio
Decode/ Decode/
Output Output 10 10
Mem Mem
11 11
User/Sys User/Sys
Control Control
3
Mem Mem
13 13
Frame Frame
Buffer Buffer
7
Video Video
Output Output 8
System Behavior
Functional Specification
of System.
No notion of hardware or
software!
Implementation Architecture
Hardware and Software
Optimized Computer
DSP RAM DSP RAM
External External
I/O I/O
System System
RAM RAM
DSP DSP
Processor Processor
Processor Bus Processor Bus
Control Control
Processor Processor
MPEG MPEG
Peripheral Peripheral
Audio Audio
Decode Decode
SOC - 37 © by Tien-Fu
Chen@CCUMap Between Behavior from Architecture
Front Front
End End 1
Transport Transport
Decode Decode 2
Rate Rate
Buffer Buffer
12 12
Rate Rate
Buffer Buffer
9
Rate Rate
Buffer Buffer
5
Sensor Sensor
Synch Synch
Control Control
4
Video Video
Decode Decode 6
Audio Audio
Decode/ Decode/
Output Output 10 10
Mem Mem
11 11
User/Sys User/Sys
Control Control
3
Mem Mem
13 13
Frame Frame
Buffer Buffer
7
Video Video
Output Output 8
Audio Decode Behavior
Implemented on
Dedicated Hardware
Transport Decode Implemented
as Software Task Running
onMicrocontroller
DSP RAM DSP RAM
External External
I/O I/O
System System
RAM RAM
DSP DSP
Processor Processor
Processor Bus Processor Bus
Control Control
Processor Processor
MPEG MPEG
Peripheral Peripheral
Audio Audio
Decode Decode
Communication
Over Bus
SOC - 38 © by Tien-Fu
Chen@CCUEmbedded Software Crisis
Cheaper, more powerful Cheaper, more powerful
Microprocessors Microprocessors
More More
Applications Applications
Increasing Increasing
Time Time-to to-market market
pressure pressure
Bigger, More Complex Bigger, More Complex
Applications Applications
Embedded Embedded
Software Software
Crisis Crisis
J. Fiddler - WRS
J. Fiddler - WRS
SOC - 39 © by Tien-Fu
Chen@CCUSW: Embedded Software Tools
CPU
ROM
RAM
ASIC
ASIC
RTOS a.out
Application
software
simulator
compiler application
source
code
debugger
U
S
E
R
SOC - 40 © by Tien-Fu
Chen@CCUHardware Platforms Not Enough!
Hardware platform has to be abstracted
Interface to the application software is the “API”
Software layer performs abstraction:
Programmable cores and memory subsystem “hidden” by
RTOS and compilers
I/O subsystem with Device Drivers
Network with Network Communication Software
SOC - 41 © by Tien-Fu
Chen@CCUSoftware Platforms
Output Devices Input devices
Hardware Platform
I O
Hardware
Software
network
Software Platform
Application Software
Platform API
API
RTOS
BIOS
Device Drivers
Network
Communication
Compiler
SOC - 42 © by Tien-Fu
Chen@CCUPlatform-based methodology
Platform based design:
Application mapped on architecture
Performance evaluation and iterative
refinement
Challenges:
complete system simulation
complexity management
composability and reuse
Key elements for composability
Identification and use of useful models
of computation
FSMD, DE, DF, CSP, ...
A flexible, extensible language platform
to capture the functionality.
Composability can be achieved using
Object-oriented mechanisms:
SOC - 43 © by Tien-Fu
Chen@CCUFinal goal for SOC: Platform-Based Design
Taking Design Block Reuse to the Next Level
Rapid Prototype for
End-Customer Evaluation
SoC Derivative Design
Methodologies
System-level performance
evaluation environment
Application
Space
Methodology / Flows:
Foundation Block
MEM
FPGA
CPU Processor(s), RTOS(es) and
SW architecture
*IP can be hardware (digital
or analogue) or software.
IP can be hard, soft or
‘firm’ (HW), source or
object (SW)
Scaleable
bus, test, power, IO,
clock, timing architectures
+ Reference Design
Programmable
SW IP
Hardware IP
Pre-Qualified/Verified
Foundation-IP*
Foundry-Specific
Pre-Qualification
Foundry Targetting Flow
SOC - 44 © by Tien-Fu
Chen@CCUSummary of SOC Design
Co-Synthesis
System
Specification
Partitioning
HW Parameter
Estimation
SW Parameter
Estimation
System
Integration
Verification
Verification
Verification
Verification
Final Verification
SW
Synthesis
HW
Synthesis
ASIC OS
EXE Code