氏名(本籍) 山田 朗 (大阪府) 学 位 の 種 類 博 士 (情報科学) 学位記番号 情第39号 学位授与年月日 平成16年7月15日 学位授与の要件 学位規則第4条第2項該当 最終学歷昭和59年3月信州大学大学院工学研究科精密工学専攻修士課程修了 論文題目 Architecting, Designing, and Implementing Multimedia RISC Processors(マル チメディア RISC プロセッサのアーキテクチャ・設計・構築に関する研究) 論 文審 査委員 (主査) 東北大学教授 中村 維男 東北大学教授 小林 広明 東北大学教授 青木 孝文 東北大学講師 鈴木 健一 ## 論 文 内 容 要 旨 The word "multimedia" means not only the simple combination of text, voice, audio, video, and graphics, but also their interaction with these media, humans, networks, and storage media. The multimedia computing, which is the key technology to realize multimedia systems, can be divided into signal processing and data processing. The signal processing treats text, audio, video, graphics, communication, and recognition. The data processing concerns with fields such as browsers, TCP/IP, GUI, Linux, Windows CE, and so on. This thesis describes architecting, designing, and implementing of multimedia RISC processors for the signal processing field and the data processing field in multimedia computing. For the signal processing field, this thesis proposes a new multimedia RISC processor to realize multimedia systems with small chip area and small power. An attractive approach for implementing the multimedia system is to use a basic dual-issue RISC architecture. A combination of state-of-the-art process technology, sub-word operations, and special video operations results in a small CPU core capable of real-time MPEG-2 video decoding. This approach brings the advantage of easy programming for general-purpose applications. The dual-issue RISC processor achieves higher clock frequencies, higher throughputs, and a higher degree of hardware resource utilization than any other multimedia processors that integrate several computational units. This thesis proposes the dual-issue RISC, D30V processor, for the signal processing field in the multimedia applications. For the data processing field, this thesis proposes a single-issue RISC microcontroller which operates at the frequency of 400 MHz at 1.8 V for the data processing. This microcontroller is based on Mitsubishi M32R architecture. This thesis describes the architecture, design, and implementation of the microcontroller to achieve the high-speed operation. As process technology progresses, it becomes very important to care signal integrity design. This thesis also discusses the signal integrity design and analysis methods used in the handcrafted circuits in the M32R RISC processor. This thesis consists of 5 chapters. Chapter 1, an introductory chapter, describes the background and the technical trends of related fields of this research. The difficulties of the multimedia computing are clarified, and the purpose and meanings of this research are described. Finally this chapter describes the outline of the thesis. Chapter 2 introduces the RISC processor D30V for signal processing that can exploit four-way parallelism, that is derived from two-level hierarchical parallel execution: two sub-instructions can be issued per clock cycle and each sub-instruction can perform two computations. In addition to the four-way parallelism for a sustained peak performance of 1000 MOPS at 250 MHz, the D30V processor is enhanced with DSP and special video operation instructions for multimedia applications. The D30V core integrates 300 K transistors in an 8 mm² core area, and it is fabricated onto a 6 mm x 6.2 mm chip with 32 kB instruction and 32 kB data RAMs in a 2.0 V, 0.3-micromater, four-layer metal CMOS process. The chip consumes 1.2 W when executing an inverse discrete cosine transform (IDCT) at 250 MHz. The D30V processor realizes a speed-up of about 4.2 times over a single-issue RISC for MPEG video block decoding by its four-way parallel execution mechanism, its instruction pipeline with bypassing, and its multimedia instructions. Chapter 3 presents a viable solution for real-time MPEG-2 signal processing based on a general-purpose dual-issue RISC, the D30V, running at 243 MHz. The MPEG-2 encoding system consists of an MPEG-2 encoder chip, a motion estimation chip, and external memories. The encoder chip includes the D30V cores and the dedicated hardware of the variable length coding, block loader, and DCT/IDCT. The D30V core performs the steps of coding mode decision, subtracting the previous frame, quantization, inverse quantization, rate controlling, scan, and reconstruction. The estimated area for the encoder, 23.0mm<sup>2</sup> using a 0.3-micrometer CMOS process, is 33% smaller than that of the dedicated hardware approach. The estimated power consumption for the encoder is 13% smaller than that of the dedicated hardware approach. The dual-issue RISC processor approach has the advantage of a small chip area, low power consumption and that of being very easy to program for multimedia applications. For MPEG-2 decoding, the D30V core and a small set of dedicated hardware can be integrated on the same chip. In the software approach, the dedicated hardware performs the Huffman decoding, half-sample motion compensation prediction, and serves as a DMA process for the D30V core. The D30V core running at 243 MHz performs inverse scan, inverse quantization, computation of the IDCT, interpolation for motion compensation, and reconstruction processes. The total chip area of the decoder with the D30V core, 23.3mm<sup>2</sup>, is 5% smaller than that of the dedicated hardware approach with 33% power consumption increase. The hybrid approach for the decoder uses dedicated hardware for the IDCT to reduce the task for the D30V core. Therefore, the clock frequency of the D30V core can be slowed down from 243 MHz to 121.5 MHz. The estimated power consumption for the decoder is 3% smaller than that of the dedicated hardware approach with 2% of area increase. To minimize chip area, the decoder with the D30V core running at 243 MHz, the software approach, is the best solution at the expense of increasing power consumption. On the contrary, to reduce the power consumption, the decoder with the D30V core running at 121.5 MHz, the hybrid approach, is the best solution at the expense of chip area. In order to support the change of multimedia algorithm, the software approach with the dual-issue RISC, D30V, is a good solution, because the approach has the advantage of being easy to program for various applications. Because the dual-issue RISC processor is simple and has a higher degree of hardware resource utilization, it realizes small chip area with enough computational power. The power consumption in the multimedia systems with the dual-issue RISC processor is almost as same as the systems with dedicated hardware. In case power consumption is critical, further power consumption is reduced by providing dedicated hardware to execute the dominant processing instead of the dual-issue RISC core. Chapter 4 proposes the single-issue RISC microcontroller which operates at the frequency of 400 MHz at 1.8 V for data processing, and describes architecture, design, and implementation of the microcontroller to achieve high speed operation. On the architectural level, a seven-stage instruction pipeline is adopted for the RISC processor for data processing. The instruction fetch (F) and the operand access (M) are divided into two stages (F1 and F2, M1 and M2), because memory access is a critical path. Regarding the 7-stage pipeline in comparison with 5-stage, the penalty is about 8% because of branch penalties and load latencies. To achieve the high speed operation, some ideas are proposed on the design level as follows. The hierarchical bus structure which consists of CPU-bus set and peripheral-bus set is proposed. The CPU-bus set is synchronized by a high-speed CPU clock. Various peripherals are connected to the peripheral-bus set, which is synchronized by a low-speed peripheral clock. And also, the clock gear circuit is proposed. By using the clock gear, suitable ratios of three clocks in the RISC processor can be selected. In order to implement the high speed RISC processor for data processing, the new design techniques such as the bus-line layout, the clock distribution, and the IR drop analysis are adopted. By using the paired-bus structure, the wiring delay is reduced to about 30%. The clock skew is reduced to 135 ps by the clock distribution method. In order to reduce the IR drop, it is effective to place the decoupling capacitors inside the chip. As a result, high-speed operation of 400 MHz is achieved with power dissipation of 0.96 W at 1.8 V. The conclusions are given in Chapter 5. With rapid progress toward an information society, the demand for multimedia computing increases more and more. The process technology continues to progress, but the amount of leak current cannot be disregarded after about $0.13 \mu$ m generation. If threshold voltage of transistors is set high to keep small leak current in the advanced process technology, the current drivability of transistors is not so improved. In order to reduce leak current, studies on dynamic power control will be more important. On the other hand, the total number of transistors in a chip increases with the progress in process technology. We will be able to use abundant transistor resources for multimedia systems. Therefore, further studies for parallel processing with multi-processor, which uses a lot of embedded transistors effectively, will be more important to answer the demand for the multimedia computing. ## 論文審査の結果の要旨 マルチメディア化社会では、音声や画像等を扱う信号処理と、GUI 等で必要な汎用処理の両方を高速かつ低電力で実現するプロセッサの開発が極めて重要である。本論文は、マルチメディア用途の信号処理プロセッサと汎用処理プロセッサに求められる要件に基づき、それぞれに適したプロセッサのアーキテクチャを創成した上で、構築面まで踏み込んで評価し、取りまとめたもので、全編5章よりなる。 第1章は序論である。 第2章では、信号処理向けのプロセッサとして、専用の RISC マイクロプロセッサアーキテクチャ D30V を提案し、その命令セットの設計と実装を示している。D30V は、整数演算ユニットとメモリ参照ユニットの非対称な二つの演算ユニットを有しており、信号処理に典型的に見られる演算を高効率に処理することを狙いとしている。また、32 ビットを基本データ型としながらも、16 ビットデータをも柔軟に扱うために、データ型と命令セット構成について工夫を凝らし、画像や音声の処理に特別の配慮をしている。性能評価結果は、D30V が信号処理に有効なプロセッサアーキテクチャであることを示している。これは重要な成果である。 第3章では、前章で提案した D30V をコアとする実時間 MPEG 符号化・復号化プロセッサについて、ハードウェアコストと電力消費の視点から論じている。一般に、RISC プロセッサによる処理は、専用ハードウェアによる場合よりも、チップ面積では有利であるが、消費電力の点では不利とされていた。しかし、本章の評価結果は、RISC プロセッサに工夫を施すことにより、専用ハードウェアと変わらない消費電力を達成できることを示している。このことから、RISC プロセッサの柔軟性をマルチメディア処理でも利用でき、かつ、低コストで低電力なシステムを構築できることを明らかにしている。これは、今後の組み込みシステムの設計に活かすことができる新しい知見である。 第4章では、組み込み用途に用いる汎用プロセッサのアーキテクチャ、設計、および構築について述べている。ここでは、アーキテクチャレベル、デバイスレベル、構築レベルでの工夫により、従来方式の3倍のクロック周波数を達成している。これは、実用的な組み込み型汎用プロセッサを構築できることを示しており、重要な知見である。 第5章は結論である。 以上要するに本論文は、信号処理と汎用処理のそれぞれの特性に着目した上で、各処理に適した プロセッサのアーキテクチャの創成から設計、構築までを学理的にまとめたものであり、計算機科 学および情報基礎科学の発展に寄与するところが少なくない。 よって、本論文は博士(情報科学)の学位論文として合格と認める。