5 - RISC-V

ucla | CS M151B | 2024-01-23 16:36


Table of Contents

RISC-V ISA Background

  • open source, free to use - will do 32-bit int
  • focus on efficiency, extensibility, accessibility
  • 4th gen prev was widely used in ARM, 3rd gen used in embedded
  • diff versions can be extended to have the following
  • open extensions
    • Instructions Background

  • base 32-bit words, 7-bit opcode
  • C (compressed instr) extension allows half-word instrs
  • variable length supported by extensions, use LSB to identify length
  • instruction formats depend on operation type, shown below
  • Base Instruction Set (full listed in chapter 9)
  • Instruction opcodes represent the name for a et of sub intructions, the 32-bit instruction can contain a funct (func_type) defining what type of subinstruction of the opcode
    • e.g., OPCODE = BRANCH, funct3 = BEQ/BNE,BGE,BLT

      Integer extension

  • motivation to incr num bits of bin num while preserving value -> for arithmetic of varying data width
  • zero-extension to append 0s (in binary looks liek prepend, shiftr)
  • sign-extension to apply sign extension

    Immediate variants

  • S and B are 12 bit, U and J are 20 bit
  • Int Reg - Immediate Instructions

    Register - Register Ops

    Control Flow (Branch/Jump) Ops

  • jumps from PC (calling address) + offset (+ or -), the dest saves PC+4 (the next instruction)
  • offset applied to base and jumps there, dest stores PC+4 from calling addr

Load and Store Ops

Loading (to registers)

  • load ops are heavy on memory alignment, you can load BYTE,HALF,WORD,DOUBLE,U_BYTE,U_HALF,etc.
  • the MSB above for the funct, gives how to sign extend half or byte size words when using LB,LH

    Loading Immediates

  • we can load into the upper 20 bits or lower 20 bits uing LUI

    Storing (to memory)

  • all memory/address ops including load/store deal with words and shift/extract/extend to get the half-word or byte, but this may overwrite a full word even if you are storing a single byte

    Control and Status Ops

  • there are ops for unsigned immediate (I postfix) and signed
  • There exists a CSR flag table

    Privilege Levels

  • privilege levels dictate the software access level to enable access control (what the app can do) and security (running app in iso)

    User Level (U) CSR

  • lowest privilege can be accessed by user mode
  • instructions safe to expose to apps
  • e.g., performance counters

    Hypervisor (H) CSR

  • optional level in new RISC-V
  • for virtualization (VMs)
  • between M and S modes

    Supervisor (S) CSR

  • system level for OS
  • can control resources but lower level that machine

    Machine (M) CSR

  • most privileged, machine-level
  • controls low level stuff like interrupts, exception handling, and physical memory access
  • used at boot
  • usually shared

    Switching/Usage

  • CPU starts at machine level
  • S/W can use CSR mstatus to lower privilege level
  • mret,sret,uret used to return to privilege levels
  • CPU excepts if wrong privilege envoked

    Common CSR: Counters

  • the counts are stored as 64-bit values, h modifier an be used to access higher 32 bits
  • but overflow in one 32 bit value can lead to issues in 2-part reading lower then upper, lower could overflow in the time it takes to read upper, thus solution, check twice
    • loop is the address of the cycle count, we choose which to use based on overflow and loop until not overflowed

RISC-V ABI

  • ra is the return address that the program jumps to at the end of the function call, e.g. if main calls foo, ra stores the address of main
  • r0==a0 is the function return value storage
  • r0 == a0 below

    Stack Pointer

  • Stack moves from high address down (i.e. we decrement to increase stack space)
    • stack moves down, heap moves up (dynamic data)
  • the CPU reads the predetermined stack base address from the keyword: STACK_BASE_ADDR

    Global pointer

  • starts at the middle of the predetermined (at compile time) static frame space bc the prog knows the amount of static data required
  • the gp can then increment or decrease bc it begins in the middle of the static (global) space
  • the linker exports the pointer location address as the keyword __global_pointer

    Thread pointer

  • the TLS (thread local storage) tracks thread local variables
  • so we need a thread pointer to point to the values in the thread local storage (allocated on the heap) - there is only 1 TLS for the program and each thread allocates within the TLS based on where the tp points to
  • the linker/compiler knows how much TLS space to allocate at compile time so we just allocate a static TLS frame and use the tp to identify the storage location
    • e.g.
  • ON MULTITHREADED APP - register file is NOT shared
    • each thread has its own physical register