Reduced Instruction Set Computing (RISC) Processors
Modern RISC processors operate under stringent constraints of power, area, and performance. Silicon implementations commonly balance 32-bit or 64-bit native word sizes with execution throughput requirements of 1-4 instructions per clock cycle. Contemporary designs must manage thermal envelopes between 1-15W for mobile applications while delivering computation capabilities necessary for increasingly complex workloads.
The fundamental challenge in RISC processor design lies in maintaining instruction set simplicity and execution efficiency while extending capabilities to handle specialized computational domains without compromising the clean architectural model.
This page brings together solutions from recent research—including variable-length opcode extensions for expanded addressing, integrated matrix multiplication acceleration within existing vector units, flexible multi-precision SIMD instruction support, and FPGA-integrated designs for user-defined instruction sets. These and other approaches demonstrate how the RISC philosophy continues to evolve while preserving the architectural clarity that enables efficient implementation across diverse computing environments.
1. ARM CPU Core with Integrated Outer Product Engine and Accumulator Array for Scalable Matrix Extensions Execution
MICROSOFT TECHNOLOGY LICENSING LLC, 2025
Implementing ARM's Scalable Matrix Extensions (SME) instruction set in an ARM CPU core without adding a separate SME accelerator. It reuses the existing vector hardware already present in the ARM CPU core for executing the SSVE instructions for the SME instruction set. The method involves adding an outer product engine and an accumulator array inside the CPU core to compute outer products and accumulate results for matrix multiplication. The outer product engine uses temporal single-instruction multiple-data (SIMD) processing to reduce memory bandwidth by computing over multiple cycles. The CPU core clears the vector registers and accumulator array when entering and exiting streaming mode.
2. Design of Low Power Control Unit for RISC-V Processor Core
Johannes Chan, Chong Li, Warsuzarina Mat Jubadi - Akademia Baru Publishing, 2024
This research work focuses on the development of a low-power decode logic for a RISC-V processor core with specifications. The goal is to create a controller that performs all six groups of instruction formats outlined in the RV32I Base Integer Instruction Set. The control unit is designed to decode a total of 13 instruction sets, allowing for a comprehensive range of operations. A single instruction pipeline approach is implemented in the design to optimize performance. The synthesis of the design is carried out using the 32 nm standard library, resulting in a maximum operating frequency of 666.67 MHz. To further enhance power efficiency, clock gating techniques are employed, leading to a reduction in power consumption by 18.72 % from 112.15 W to 91.45 W. Additionally, the layout of the design is optimized, resulting in an area of 354.74 mm2. The successful development of this low-power decode logic demonstrates its potential for integration into larger RISC-V processor cores. Future enhancements can include expanding the instruction decoding capability to encompass the full range... Read More
3. Implementation of a Multiclocked Pipelined Processor Based on RISc-V using RV321
- REST Publisher, 2024
This study's primary objective is to create a 32-bit pipelined processor based on the open-source RV32I Version 2.0 RISC-V ISA that operates across several clock domains. A processor known as a RISC (Reduced Instruction Set Computer) employs less hardware than a CISC (Complex Instruction Set Computer) in order to reduce the complexity of the instruction set and accelerate the execution time per instruction. In addition, we built this processor with five pipelining layers, which allows for concurrent processing of instructions. All of the procedures are thoroughly explained, supported with the required block diagrams. To guarantee that variable delays, such as clock skew and meta-stability, are avoided inside the stage pipeline registers, multiple clock domains using two clock sources are employed.
4. Synchronization Support in 64-bit Out-Of-Order Superscalar Dual-Core RISC-V Processor
Shubham Yadav, Manish Kumar, S Sajin - IEEE, 2024
This paper discusses the implementation of atomic instructions in a dual-core 64-bit out-of-order superscalar processors based on the open-source RISC-V instruction set architecture. Leveraging the advantage of RISC-V's modularization characteristics, each core implements RV64IMAFDC extension and optional supervisor and user mode privilege levels. In this paper, we focus on the A-extension, the atomic instruction set extension. This extension introduces instructions that provide atomic memory operations, enabling synchronization across multiple RISC-V harts within the same memory space. Our goal is to present an efficient execution flow of atomic memory operation instructions and Load-Reserved/Store-Conditional instructions for a dual-core System-on-Chip. We subsequently verify the synchronization capabilities through the execution of a standalone game application on SoC implemented on a Xilinx Kintex UltraScale KU085 FPGA-based board.
5. Matrix Multiplication Instruction with Configurable Vector Register Groups for RISC-V Processors
APPLE INC, 2024
A matrix multiplication instruction for RISC-V processors that enables efficient matrix operations by specifying a target vector register group for the result and source vector register groups for the input matrices, allowing for flexible matrix sizes and vector lengths.
6. Multi-Voltage Design of RISC Processor for Low Power Application: A Survey
Dheeraj Sharma, R Vikram - Deanship of Scientific Research, 2024
Power management is becoming important aspect as the size of transistor is shrinking.For processor design, Reduced Instruction Set Computer (RISC) architecture is preferable as compared to Complex Instruction Set Computer (CISC) architecture because of its simplicity and availability.To design the low power RISC processor, there are a few techniques that had been used earlier, such as a) pipelining and b) Common Power Format language to generate power intent of RISC processor design.In the present work, for designing a 16-bit RISC processor with low power consumption, a multi-voltage design technique has been used.In this technique, different supply voltages are provided to different blocks of the design.This technique is implemented with the help of Unified Power Format (UPF).Further, various operations such as ADD, SUB, INVERT, AND, OR, Right Shift, Left Shift, and Less Than are verified on modelsim for the designed 16-bit RISC processor.
7. Generation of Coverage based Verification Benchmark Programs for RISC-V Processor
Sudeendra Kumar, Adarsh Hegde, V V Likhita - IEEE, 2024
The RISC-V architecture has gained significant popularity as an open and extensible instruction set architecture (ISA) for a wide range of computing applications. This paper introduces a novel approach to enhance coverage in RISC-V processor verification through the design and development of benchmark verification programs. To achieve comprehensive coverage, a CRIG (Constrained Random Instruction Generator) is designed which is capable of generating a diverse set of random instruction encodings while adhering to the RISC-V ISA specifications. enabling the creation of randomized instruction sequences based on constraints. The utilization of coverage groups ensures critical aspects of the processor are adequately tested. Through extensive testing, the paper identifies a benchmark constraint randomized file that showcases exceptional coverage, serving as a valuable reference for future verification projects. The resulting benchmark programs form a comprehensive suite that rigorously tests the RISC-V processor's functionality, providing confidence in its compliance with the RISCV ISA spe... Read More
8. Design a 5-stage pipeline RISC-V CPU and optimise its ALU
Lifu Deng - EWA Publishing, 2024
The RISC-V instruction set has advanced and expanded significantly in recent years. It is an open instruction set architecture (ISA) based on the concept of Reduced Instruction Set Computing (RISC). This article uses Verilog to design a 5-stage pipeline CPU based on RISC-V architecture in Vivado 2022.2. The CPU can execute 38 instructions and optimises its arithmetic logic unit (ALU) by optimising adders, shifters, and multipliers. Next, write a testbench in the simulation software to verify the functionality of the CPU. RTL diagrams and reports are then generated to verify the design structure and evaluate resource allocation. Finally, the CPU successfully executes the instruction and obtains the correct operation result, and the occupation of LUT resources in the shifter part is reduced. This work serves as an important reference for system-on-chip (SoC) and computer design in general. It not only highlights the potential of the RISC-V architecture but also demonstrates the success of optimisation efforts. This paves the way for more powerful and efficient computing systems.
9. RISC-V processor enhanced with a dynamic micro-decoder unit
J. Pottier, Thomas Nieddu, Bertrand Le Gal, 2024
For years, the open-source RISC-V instruction set has been driving innovation in processor design, spanning from high-end cores to low-cost or low-power cores. After a decade of evolution, RISC architectures are now as mature as the CISC architectures popularized by industry giant Intel. Security and energy efficiency are now joining execution speed among the design constraints. In this article, we assess the benefits and costs associated with integrating a micro-decoding unit inspired by CISC processors into a RISC-V core. This unit, added in a specific pipeline stage, should enable dynamic custom instruction sequences execution whose usage could be, for instance to compress binaries, obfuscate behavior, etc.
10. A Performance Modelling-Driven Approach to Hardware Resource Scaling
Alexandre Rodrigues, Leonel Sousa, Aleksandar Ilić - Springer Nature Switzerland, 2024
The continuous demand for higher computational performance and the stagnating developments in the general purpose processor landscape have led to a surge in interest for highly specialized and efficient hardware. Combined with the rising popularity of parameterizable hardware, a new opportunity to optimize these architectures for particular workloads arises, largely driven by the RISC-V Instruction Set Architecture (ISA). This work present an application-specific optimization methodology for general purpose processors, enabling the development of architectures which are faster and more efficient for their designated workloads. Driven by the Cache-Aware Roofline Model (CARM) insights, the methodology guides the configuration of the memory and computational subsystems of the processor. We apply this methodology to two applications, demonstrating up to a $$2.67\times $$ performance increase and a $$1.34\times $$ improvement to energy efficiency.
11. Design of RISCV processor using verilog
E. Jaya, B. Maneesha, G. Sriram - i-manager Publications, 2024
The main goal of this paper is to develop a 32-bit pipelined processor with several clock domains based on the RISCV (open source RV32I Version 2.0) ISA. To minimize the complexity of the instruction set and speed up the execution time per instruction, a RISC (Reduced Instruction Set Computer) processor that uses less hardware than a CISC (Complex Instruction Set Computer) is used. Furthermore, this paper constructed this processor with five levels of pipelining with the aid of necessary block diagrams, and all of the processes are well described. In this paper, a RISCV processor is designed and simulated using Verilog. The design of the RISCV processor provides an alternative for software and hardware design to the computer designers as it provides free and open instruction set architecture (ISA). Besides, the designed RISCV processor will be using 5-stage pipeline techniques to improve the overall performance of the processor. This system is started by implementing several main modules, such as alu, aludec, maindec, imem, dmem, regfile, pc_mux, result_mux, pipeline register (IF/ID,... Read More
12. Out-of-Order Execution of Instructions for In-Order Five-Stage RISC-V Processor
Sushmita Hubballi, Saroja V. Siddamal - Springer Nature Singapore, 2024
In recent years, there have been remarkable advancements in Integrated Circuit (IC) technology, enabling the development of highly sophisticated computer systems on a single chip. Custom System on Chip (SoC) designs, where the processor core(s) and cache represent a smaller portion of the overall chip, have gained widespread popularity. Nowadays, it is challenging to come across an electronic product of any size that does not incorporate a processor. Open-source instruction set architecture, such as RISC-V-based processors, has gained traction in custom SoC design. A processor is the core of an electronic system. In a five-stage pipelined RISC-V processor, instructions are executed in the sequence that they are given. In this work, the architecture suggests an out-of-order execution of instructions when the resources are available yet all instructions are to be blocked owing to a multi-cycle instruction. In this architecture, in comparison with an in-order execution, we notice a difference of 120 ns, i.e., six clock cycles being used efficiently in an out-of-order execution when a mu... Read More
13. Optimizing CNN Computation Using RISC-V Custom Instruction Sets for Edge Platforms
Shihang Wang, Xingbo Wang, Zhiyuan Xu - Institute of Electrical and Electronics Engineers (IEEE), 2024
Benefit from the custom instruction extension capabilities, RISC-V architecture can be optimized for many domain-specific applications. In this paper, we propose seven RISC-V SIMD (single instruction multiple data) custom instructions that can significantly optimize the convolution, activation and pool operations in CNN inference computation. More specifically, instruction CONV23 can greatly speed up the operation of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">F</i> (2 2, 3 3). With the adoption of Winograd algorithm, the number of multiplications can be reduced from 36 to 16, and the execution time is also reduced from 140 to 21 clock cycles. These custom instructions can be executed in batch mode within the acceleration module where the immediate data can be reused, so the latency and energy overhead associated with excess memory accesses can be eliminated. Using inline assembler in C language, the custom instructions can be called and compiled together with C source code. A revised RISC-V processor, RI5CY-Accel is construct... Read More
14. RISC-V V Vector Extension (RVV) with reduced number of vector registers
Eino Jacobs, Dmitry Utyansky, Muhammad Hassan, 2024
To reduce the area of RISC-V Vector extension (RVV) in small processors, the authors are considering one simple modification: reduce the number of registers in the vector register file. The standard 'V' extension requires 32 vector registers that we propose to reduce to 16 or 8 registers. Other features of RVV are still supported. Reducing the number of vector registers does not generate a completely new programming model: although the resulting core does not have binary code compatibility with standard RVV, compiling for it just requires parameterization of the vector register file size in the compiler. The reduced vector register file allows for still high utilization of vector RVV processor core. Many useful signal processing kernels require few registers, and become efficient at 1:4 chaining ratio.
15. RISC-V Processor for IOT Applications
Et al. Rajveer Singh - Auricle Technologies, Pvt., Ltd., 2023
RISC-V is a recently introduced instruction-set architecture (ISA) that offers innovative advantages, including low power consumption, affordability, and scalability. Utilizing an open, non-proprietary Instruction Set Architecture (ISA) enables the creation of on-the-fly design of soft error countermeasures at the microarchitecture level. This may significantly enhance the resilience of Application Specific Standard Products (ASSP) and FPGA implementations. This paper offers a quick overview of the RISC-V architecture. This paper presents a plan to create and execute a 32-bit single-cycle RISC-V processor using Verilog HDL in the Vivado software.
16. How to Design an ISA
David Chisnall - Association for Computing Machinery (ACM), 2023
Over the past decade I've been involved in several projects that have designed either ISA (instruction set architecture) extensions or clean-slate ISAs for various kinds of processors (you'll even find my name in the acknowledgments for the RISC-V spec, right back to the first public version). When I started, I had very little idea about what makes a good ISA, and, as far as I can tell, this isn't formally taught anywhere. With the rise of RISC-V as an open base for custom instruction sets, however, the barrier to entry has become much lower and the number of people trying to design some or all of an instruction set has grown immeasurably.
17. Design of Decoded Instruction Cache
Takero Magara, Nobuyuki Yamasaki - IEEE, 2023
Recent microprocessors improve performance by extracting various levels of parallelism. Among these, out-of-order processors focus on ILP to improve performance. On the other hand, out-of-order processors consume a lot of power because they fetch and decode many instructions.We propose a Decoded Instruction Cache (DIC), in which the control signals generated by decoding RISC instructions are stored as decoded instructions in the DIC. The scheme improves performance and reduces power consumption because the results of fetch and decode can be reused. The DIC also supports multi-threaded execution, so TLP is also improved.When implemented in a multithreaded RISC processor, the DIC improves IPC by 2.39%.
18. An Open-Source FPGA Platform for Shared-Memory Heterogeneous Many-Core Architecture Exploration
Rafael Tornero, David R. Rodriguez, José Maria Martínez - IEEE, 2023
Many-core architectures, especially those with heterogeneous components, are gaining momentum due to the benefits of having an Open Source Instruction Set Architecture (ISA), such as RISC-V. In this paper we present a new computing platform for developing and analysing future heterogeneous architectures where specific custom accelerators are integrated with more standard RISC-V computing cores. The platform implements a coherent shared memory model which simplifies programmability and enables efficient communication support to all heterogeneous components. We detail the network and memory subsystems and provide preliminary evaluation results showing the benefits when using two systolic accelerators managed by two computing cores.
19. RISC processor implementation 32-bit MIPS-based: an approach to teaching and learning
Francisco Silva e Serpa, Alan Marcel Fernandes de Souza, Hélio Fernando Bentzen Pessoa Filho - Uniao Atlantica de Pesquisadores, 2023
This article describes the development of the design of a processor based on the RISC architecture, taking the 32-bit MIPS microprocessor as a basis. The RISC architecture, which stands for Reduced Instruction Set Computer, is characterized by having a reduced instruction set, aiming to optimize the processor's overall performance. The designed MIPS processor follows a 5-stage pipeline, which comprises the instruction fetch, instruction decode, execution, preparation and memory access phases. The main objective of this article is to carry out the structural development of the processor, using the hardware description language. This implies the creation of a Verilog representation that will later be used to generate the extraction of the processor's logic circuit. Furthermore, the project involves generating a timing diagram that illustrates the temporal behavior of processor operations and, ultimately, the physical implementation of the processor core. This work seeks to contribute knowledge in the field of computer architecture, providing a practical implementation of a RISC process... Read More
20. Vectorized Nonlinear Functions with the RISC-V Vector Extension
Eric Bavier, Nicholas Knight, Hugues de Lassus Saint-Geniès - IEEE, 2023
The RISC-V Vector instruction set extension (RVV) provides scalable data-parallel instructions suitable for accurate and performant implementations of numerical algorithms across many application domains [1]. The primary objective of this paper is to share our experience implementing vector C99 <math.h> (libm) functions using RVV. Our contributions are threefold: First, we contributed an RVV port of SLEEF, a multi-platform open-source vector libm. Second, we show that while SLEEF simplifies porting efforts, it also precludes some RVV-specific optimization opportunities. With SiFive's X280 vector processor micro-architecture as a case-study, we highlight RVV features that optimized code can use. We also expand the discussion to how these features might be used differently when optimizing for other cores. Third, we compare the performance of our SLEEF RVV port to our own RVV-native routines. We present results from 1-ulp accurate implementations of Libm functions in a cycle-accurate simulation of the X280 pipeline to show the impact of RVV-enabled optimizations.
Get Full Report
Access our comprehensive collection of 111 documents related to this technology
