|
·矢量协处理器能处理长度可变的1~64位数据 ·可变长度的数据能加速传输和实现精确折衷
at a glance:
-
The vector coprocessor can handle variable-length, 1- to 64-bit data.
-
Variable-length data enables speed and precision trade-offs.
RC Module’s NeuroMatrix NM6403 dual-core, application-specific DSP is based on the NeuroMatrix architecture targeting video-image processing and neural-network applications. It provides scalable performance, a programmable operand width of 1 to 64 bits, and operation as fast as 50 MHz. This flexibility allows designers to trade precision for performance to suit their applications. The NM6403 processor includes a 32/64-bit RISC processor and a 1- to 64-bit vector coprocessor that supports vector operations with elements of variable bit lengths (patent pending). Two identical programmable interfaces work with external memory, and two communication ports are hardware-compatible with Texas Instruments’ TMS320C4x, allowing designers to build multiprocessor systems.
The vector coprocessor, which has an SIMD (single-instruction-multiple-data) architecture, works on packed integer-data comprising 64-bit blocks in the form of variable 1- to 64-bit words. The device supports vector-matrix or matrix-matrix multiplication. The Vector coprocessor’s core looks like an array multiplier comprising cells that include a 1-bit memory (flip-flop) surrounded by several logical elements. Designers can combine the cells into several macrocells with two 64-bit programmable registers. These registers define the borders between rows and columns with macrocells. Each macrocell performs the multiplication on variable-input words using preloaded coefficients and accumulates the result from the macrocells in the column above it. The columns simultaneously calculate the results in one processor cycle. For 8-bit data and coefficients, the vector coprocessor performs 24 MAC (multiply-accumulate) operations with 21-bit results in one 20-nsec processor cycle. The number of MAC operations depends on the length and number of words packaged into a 64-bit block. The engine’s configuration can change dynamically during calculations. An application can start with maximum precision and minimum performance and dynamically increase performance by reducing the data-word lengths. To avoid arithmetic overflow, the NM6403 uses two types of saturation functions with user-programmable saturation boundaries.
The VLIW (very-long-instruction-word) RISC core uses a five-stage pipeline that operates with 32- and 64-bit-wide instructions. Each instruction usually executes two operations. Two 64-bit interfaces support SRAM, DRAM, and EDO DRAM and comprise two separate address-generation units that can address as much as 16 Gbytes. Each interface supports two memory banks and can support a “shared-memory” mode. Two DMA coprocessors transfer data between high-speed I/O-communication ports and external memory.
Addressing and processing modes: The NM6403 supports 32-bit immediate, base, indexed, and relative addressing.
Special instructions or integral-peripheral functions: The NM6403 processor uses vector instructions to handle packets of as many as 32 64-bit data words. These instructions may define operations such as matrix-matrix, matrix-vector, or vector-vector multiplication; vector-vector addition and subtraction with saturation of results; block moving; and bit manipulation. The NM6403 has conditional branch, call, and return instructions.
Development support: The NeuroMatrix Software Development Kit for PCs includes an ANSI X3J16/95-0029 preliminary-standard-compatible C++ compiler, an assembler, an instruction-level simulator, a cycle-accurate simulator, a linker, a source-level debugger, a load/exchange library, and a set of application-specific vector-matrix libraries. RC Module offers PCI and CompactPCI evaluation/development boards for real-time DSP and video-image-processing designs. The vector-matrix library simplifies C-language programming for FFT, DCT, Sobel, and Hadamard Transform. RC Module also provides a NM6403 Verilog behavioral model for Sun host platforms for system-level simulation and a synthesizable core targeting Samsung and Fujitsu semiconductor technologies.
|