新闻分析 热点文章 DSP产品 精彩专题 资料下载 信息发布 产品评测报告 市场分析报告 会员专区 聚合
当前页面位置:DSP Watch > 资料下载 > 芯片设计 > Re-Usable Low Power DSP IP embedded in an ARM based SoC Architecture
Re-Usable Low Power DSP IP embedded in an ARM based SoC Architecture
类型: 作者: 最后更新:2004-3-31 21:34:59 推荐指数: 3215
Re-Usable Low Power DSP IP embedded in an
ARM based SoC Architecture
H. H. Hellmich, A. T. Erdogan, and T. Arslan*
Abstract
The integration of different re-usable IPs (Intellectual Properties) to design SoC (System-on-Chip) devices
is widely accepted as the key to achieving higher productivity to meet shorter time-to-market demands.
Nevertheless, productivity improvements suffer because the importance of interface definitions
and consequently the integration of IPs for the targeted SoC architecture is often treated as a secondary
issue. This paper describes a scheme for the integration of a low power DSP IP embedded in an ARM
based SoC architecture which is characterised by having two interfaces intended for two different types
of on-chip bus configurations. A DSP IP has been implemented using this scheme. The power consumption
of the actual FIR filtering algorithm realised within the DSP IP has been compared to a conventional
implementation using an Alcatel 0.35 µm CMOS technology.
1. Introduction
The growing gap between the silicon gate capacity and the engineering productivity has lead to the advance
of complex SoC designs and the need for new forms of design reuse and design methodologies
[Bric99]. In order to overcome this gap, design reuse ismigrating from source reuse to integration-driven
reuse and design methodologies are changing from TDD (Timing-Driven Design) to PBD (Platform-
Based Design) [Chang99]. However, PBD gains its productivity by minimising the amount of custom
interface design and focuses around a standardised bus architecture.
The AMBA (Advanced Microcontroller Bus Architecture), which defines two types of bus configurations
[Amba99], represents an open standardised bus architecture. The system bus, called AHB (Advanced
High-performance Bus), is intended for high clock frequency system modules and enables access
to high bandwidth memory devices. The peripheral bus, called APB (Advanced Peripheral Bus), is optimised
for minimal power consumption and suits low bandwidth peripheral modules (see Figure 1).
This hierarchical bus architecture is more suitable for low power consumption and high speed performance
in comparison to a single bus architecture, such as standardised on-chip busses like the PI-Bus
[Pibus] or the CSI-Bus [Csibus]. The more modules are connected to one bus the slower it operates, and
consequently the latency (execution time of a transaction across the bus) of the bus increases. The total
bandwidth (product of clock frequency and the byte width of the bus) request of all modules is another
reason to form groups of modules around separate busses [Chang99]. Modules, requiring high bandwidth
and probably having an intensive interchange of data among each other, should be considered in
the same group while modules with much lower communication needs should be bundled together. This
provides the facility to powerdown the bus that connects the low bandwidth modules. For complex SoC
designs, the increase in bus partitioning will create more than one system and peripheral bus in order to
keep bus capacitances at realistic levels and to toggle these only when necessary [Aldw99].
Technical University of Berlin
Department of Electrical Engineering
Microelectronics Division
Einsteinufer 17, D-10587 Berlin, Germany.
University of Edinburgh
Department of Electronics and Electrical Engineering
System Level Integration Group
Edinburgh, EH9 3JL, Scotland, UK.
hellmich@mikro.ee.tu-berlin.de {ahmet.erdogan, tughrul.arslan}@ee.ed.ac.uk
* Also affiliated with the Institute for System Level Integration (ISLI) in Livingston, Scotland, UK.
Due to the existence of various standardised on-chip bus architectures, IP core based interface standards
have been published recently. IPs providing such interfaces are considered to be suitable for connection
to any type of on-chip bus architecture using a bus wrapper. The drawback of this approach is that several
layers of interface between the core functionality of the IP and the bus arise which may degrade the performance
of the IP, and certainly will add gates to the design [Bric99]. The VCI (Virtual Component
Interface) standard [Vsia00] and the OCP (Open Core Protocol) [Ocp00] are examples of IP core based
interface standards. The VCI standard is viewed to focus only on the data flow aspects of the communication
between different IPs [Smith00], such as address, data and command signals. Non-dataflow signals,
e.g. interrupts, or test signals are not considered by the VCI standard compared to the OCP.
However, this fact could be considered by the AVCI (Advanced Virtual Component Interface), which
remains the last of the three VCI standards that has to be defined by the VSIA [Vsia00].
Hence, providing IPs with an IP core based interface facilitates fast integration and results in an increase
of productivity which is desperately needed for designing future SoC devices. For low power applications,
it is important that such interfaces maintain low power consumption.
2. Proposed Low Power DSP IP
Our proposed low power DSP IP is presented in Figure 2 and its integration into an ARM based SoC
architecture is implied in Figure 1. The IP consists in its top level of five modules: the actual low power
DSP core, the register block, the VC initiator interface, the VC target interface and a gated clock circuitry.
The VC target interface is connected via the APB initiator bus wrapper to the peripheral bus
APB. The protocol between the APB initiator wrapper and the VC target interface is called PVCI (Peripheral
Virtual Component Interface), which is one of two VCI standards defined by the VSIA up to
now. According to this protocol every valid request by the APB initiator wrapper has to be acknowledged
by providing a response by the VC target interface. The VC initiator interface connects the DSP
IP via the AHB target bus wrapper to the system bus AHB. In this case, the protocol between bus wrapper
and VC interface is the second defined VCI standard called BVCI (Basic Virtual Component Interface).
The BVCI is a superset of the PVCI and provides additional optional protocol signals. The main
difference compared to the PVCI is that the BVCI allows independent handshakes for requests and responses.
This means that when the VC initiator interface provides a valid request, the AHB target bus
wrapper will acknowledge the request. At the time the bus wrapper obtains a valid response it will indicate
this to the VC initiator interface which on the other side will acknowledge the receipt of the valid
response. This kind of independent handshake for requests and responses is very suitable for modules
connected to an on-chip bus where bus arbitration takes place and therefore no guarantee for an immediate
response is given. In addition, the DSP IP operates in two different synchronous clock domains indicated
by the dashed line in Figure 2. The upper part of the DSP IP will operate at the frequency of the
system bus clock (sbclk) which will usually be the maximum frequency of the SoC design and the lower
ARM
Processor
Core
External
Memory
Interface
DMA
Controller
On-chip
RAM
Bus
Bridge
UART
GPIO
Interrupt
Controller
ROM
Interface
Arbiter
AHB
Low Power
DSP IP
Decoder
APB
Figure 1: ARM based SoC architecture
part of the DSP IP will be clocked by the peripheral bus clock (pbclk). It is assumed that the following
condition for both clock domains is met, in order to avoid additional synchronisation circuitry between
the two clock domains within the DSP IP:
(1)
In addition to the VCI protocol signals, the DSP IP provides the signals wakeup and sleep which are intended
to be directly connected to the bus bridge of the APB. When the sleep signal is asserted the bus
bridge knows that the DSP IP is in sleep mode and any access to the register block of the DSP IP could
be indicated by the bus bridge with an error transfer response over the AHB (without transferring any
data over the APB). The wakeup signal could be a select signal output of the bus bridge. If the DSP IP
is in sleep mode, it can be waken up when a given address on the system bus is provided. This address
will be decoded by the bus bridge to generate the select signal which is connected to the wakeup input
of the DSP IP. These two additional signals communicate with the low power DSP core and allow the
integration of the DSP IP into a DPM(Dynamic PowerManagement). DPMshuts down idle devices and
wakes them up when needed [Lu00] and is therefore an efficient method for power savings especially
for portable applications.
2. 1 Low Power DSP Core
The DSP core is the actual high-performance module of the DSP IP and is shown in Figure 3. It consists
of three control units: the PMU (Power Management Unit), the DMU (Data Management Unit) and the
FIR core. The PMU controls with its single output the gated clock circuitry of the DSP IP. It will be the
only active module during the sleeping state of the DSP IP in order to react to an asserted wakeup signal.
Two input signals (sleepmode and entersleepmode) coming from the register block provide information
if and when the DSP IP should enter into sleep mode when the DSP core is not active anymore. The information
whether the DSP core is still active or not is provided by the DMU. The DMU takes care of
reading and writing of data from an external memory device over the AHB. Read data that has to be filtered
will be handed to the FIR core, and valid filtered data will be written back to memory. To provide
a temporarily storage for these data transfers, a read and a write FIFO (RFIFO and WFIFO) are integrated
in the DMU.
wakeup
BVCI
PVCI APB
AHB
sleep
PMU
VC
Target
Interface
Register
Block
AHB
Target Bus
Wrapper
VC
Initiator
Interface
Low Power
DSP Core
APB
Initiator Bus
Wrapper
pbclk
sbclk
Gated
Clock
Circuitry
Low Power DSP IP
Figure 2: Block Diagram of the Low Power DSP IP
fsbclk MOD fpbclk 0 , = fpbclk 2 fsbclk × £
2. 2 FIR Core
The FIR core within the DSP core (see Figure 3) contains two static RAM blocks to store coefficients
(HRAM) and input data (XRAM), an FSM and a component that performs the actual arithmetic operations
that are necessary for direct form FIR filtering (MA(S)U). For the implementation of the conventional
FIR core this component is called MAU (Multiply-Add-Unit) which performs a multiplication of
a coefficient h and an input data x and adds the product with the previous value stored in the register
output y. The MAU is therefore identical to a conventional MAC (Multiplier Accumulator). For the low
power implementation of the FIR core a FIR filtering algorithm based on coefficient segmentation
[Erdo99] is used. According to this algorithm the coefficient h is segmented into two numbers m and s.
For the utilised two’s complement representation of data the following condition is valid:
(2)
The input data x and the number m are applied to the multiplier while a shift operation of x will be performed
according to the shift value of s. The sum of the multiplication and the shift operation is added
to the previous value stored in the register output y. This process of dividing each coefficient into two
parts can reduce power in the multiplier section by more than 50 % for 16-bit wide input data [Erdo99].
Since a multiplication and a shift operation is performed for this low power FIR filtering algorithm, the
component performing the arithmetic operations is called MASU (Multiply-Add-Shift-Unit).
The size of XRAM is assumed to be twice as large as the size of HRAM so that the FIR core is able to
calculate one valid output data without the need of requesting new input data during a folded FIR filtering.
The size of the static RAM blocks depend on the maximum number of coefficients that have to be
stored in the HRAM. In case the size of the RAM blocks exceed 256 bits (data width × address depth) a
RAM macro block of the targeted CMOS technology should be used to ensure a power efficient implementation
of the FIR core.
3. Power Based Simulation Results
The comparison of power consumption between a conventional FIR core implementation and an implementation
using coefficient segmentation has been performed. For logic synthesis and the estimation of
power consumption the Synopsys suite of synthesis tools have been used. Both FIR cores are described
WFIFO
PMU
wakeup
Low Power DSP Core entersleepmode
sleepmode
active
sleep
XRAM
RFIFO
MA(S)U
FIR Core
DMU
VCI
Initiator
Interface
Register
Block
FSM
HRAM
HREG XREG
x h (m& s)
y
Figure 3: Block Diagram of the Low Power DSP Core
hk sk mk + , = mk 0 ³
in VHDL and characteristics like data width and the maximum number of coefficients are parameterisable.
TheMAU and MASU are described in MCL (Module Compiler Language), which syntax is similar
to Verilog, in order to be synthesized by the Synopsys’ data path synthesis tool Module Compiler.
The utilised ASIC library was an Alcatel 0.35 µm CMOS technology.
For the power based gate-level simulation two different low pass filters were realised by the FIR cores
and which filter characteristics are illustrated in Table 1.
For each simulation an unfolded filter structure and 512 randomly generated 16-bit-wide input data were
used. During the simulation it was assumed that at every stage the FIR core requested new input data,
they could be directly provided by the DMU. A reduction of the cell internal power (without considering
the RAM blocks) for the FIR core utilising a MASU was observed. For the first low pass filter (LPF1) a
power reduction of 6.3 % and for the second low pass filter (LPF2) a power reduction of 7.2 % for the
FIR utilising a MASU was obtained compared to the FIR core implemented with a MAU.
4. Conclusion
In this paper the implementation scheme for a low power DSP IP has been presented. The DSP IP is characterised
by having two interfaces intended for two different types of on-chip bus configurations. The
initiator interface is intended for a high-performance, high-bandwidth bus while the target interface is
intended for a low-performance, low-bandwidth bus. The power reduction of 6.3 % and 7.2 % of the FIR
core using coefficient segmentation for the realisation of two different low pass filters encourages the
further investigation and optimisation of an FIR core utilising this kind of FIR filtering algorithm.
5. References
[Aldw99] P. J. Aldworth, ”System-on-a-Chip Bus Architecture for Embedded Applications”, IEEE
International Conference on Computer Design, Oct. 10-13, 1999, Austin, Texas, USA.
[Amba99] AMBA™ Specification, Revision 2.0, May 1999, © ARM Ltd., www.arm.com/Pro+Peripherals/
AMBA.
[Lu00] Y.-H. Lu, E.-Y. Chung, T. Simunic, L. Benini andG. De Micheli, ”Quantitative Comparison of Power
Management Algorithms”, Design, Automation and Test in Europe (DATE),March 27-30, 2000, Paris,
France.
[Bric99] M. Keating and P. Bricaud, ”Reuse MethodologyManual”, Kluwer Academic Publishers, 2nd Edition,
1999.
[Chang99] H. Chang, L. Cooke, M. Hunt, G. Martin, A. McNeilly and L. Todd, ”Surviving the SOC Revolution
– A Guide to Platform-Based Design”, Kluwer Academic Publishers, 1999.
[Csibus] CSI (Configurable System Interconnect) Bus, © Triscend Corp., www.triscend.com/products/
IndexTechLit.html.
[Erdo99] A. T. Erdogan and T. Arslan, ”A Coefficient Segmentation Algorithm for Low Power Implementation
of FIR Filters”, IEEE International Symposium on Circuits and Systems (ISCAS), May 30 – June 2,
1999, Orlando, Florida, USA.
[Ocp00] Open Core Protocol™ Specification 1.0, Document Version 1.1, Jan. 2000, © Sonics Inc.,
www.sonicsinc.com/mnet/ocp_descr.html.
[Pibus] PI (Peripheral Interconnect) Bus, © Open Microprocessor Initiative (OMI), www.sussex.ac.uk/engg/
research/vlsi/projects/pibus.
[Smith00] E. Smith, ”Bus protocols limit design reuse of IP”, EEdesign article, May 2000, www.eedesign.com/
story/OEG20000515S0026.
[Vsia00] VSI Alliance™ Virtual Component Interface Standard (OCB 2 1.0), On-Chip Bus Development
Working Group, March 2000, © Virtual Socket Interface Alliance, www.vsi.org/library/specs/
summary.htm.
Filter
Type
Passband
(kHz)
Stopband
(kHz)
Passband
ripple (dB)
Stopband
attenuation (dB)
Filter
Length
LPF1 / LPF2 0 - 3.375 / 0 - 1 5.625 - 10 / 1.5 - 5 0.002 / 0.0135 90 / 56 42 / 61
Table 1: Low Pass Filter Characteristics
资料来源:
Google
 
Web dsp.blueidea.com
本站声明: 本站所有的文章和下载资源均为个人开发者提供,如有企业用于商业用途,由此引发的法律纠纷本站及站长将不负任何责任。如有任何问题, 请联系我们

相关文章


推荐文章

· Re-Usable Low Power DSP IP emb
· Ten Steps to a Successful DSP
· Design Of Future Systems
· Configuring a VLIW-DSP core fo
关于我们 | 广告服务 | 站点地图 | 联系我们 | 投稿指南 | 程序支持
友情链接: 61IC中国电子在线 | 老古开发网 | 周立功单片机 | IC商贸网 | 电子产品世界 | 中电网 | 中国电子顶级开发网
中国EDA技术网 | EDA专业论坛 | 中国电子商贸网 | 国际电子网 | 中发网 | 中国电子工程师社区 | 北极星电技术网 | 21IC中国电子网
网络平台由蓝色理想提供 意见信箱 欢迎您的咨询、留言、建议和意见
若发现页面中有任何错误或侵犯您的版权,请来信联系我们: dspwatch AT gmail.com
Copyright © 2003 - 2007 DSP Watch, All Rights Reserved 版权所有 | 京ICP备05002321号