5 - RISC-V
ucla | CS M151B | 2024-01-23 16:36
Table of Contents
- RISC-V ISA Background
- Instructions Background
- Int Reg - Immediate Instructions
- Register - Register Ops
- Control Flow (Branch/Jump) Ops
- Load and Store Ops
- Control and Status Ops
- RISC-V ABI
RISC-V ISA Background
- open source, free to use - will do 32-bit int
- focus on efficiency, extensibility, accessibility
- 4th gen prev was widely used in ARM, 3rd gen used in embedded
- diff versions can be extended to have the following
- open extensions
- base 32-bit words, 7-bit opcode
- C (compressed instr) extension allows half-word instrs
- variable length supported by extensions, use LSB to identify length
- instruction formats depend on operation type, shown below
- Base Instruction Set (full listed in chapter 9)
- Instruction opcodes represent the name for a et of sub intructions, the 32-bit instruction can contain a
funct
(func_type) defining what type of subinstruction of the opcode - motivation to incr num bits of bin num while preserving value -> for arithmetic of varying data width
- zero-extension to append 0s (in binary looks liek prepend, shiftr)
- sign-extension to apply sign extension
Immediate variants
- S and B are 12 bit, U and J are 20 bit
Int Reg - Immediate Instructions
Register - Register Ops
Control Flow (Branch/Jump) Ops
- jumps from PC (calling address) + offset (+ or -), the dest saves PC+4 (the next instruction)
- offset applied to base and jumps there, dest stores PC+4 from calling addr
Load and Store Ops
Loading (to registers)
- load ops are heavy on memory alignment, you can load
BYTE,HALF,WORD,DOUBLE,U_BYTE,U_HALF,etc.
- the MSB above for the funct, gives how to sign extend half or byte size words when using
LB,LH
Loading Immediates
- we can load into the upper 20 bits or lower 20 bits uing
LUI
Storing (to memory)
- all memory/address ops including load/store deal with words and shift/extract/extend to get the half-word or byte, but this may overwrite a full word even if you are storing a single byte
Control and Status Ops
- there are ops for unsigned immediate (I postfix) and signed
- There exists a CSR flag table
Privilege Levels
- privilege levels dictate the software access level to enable access control (what the app can do) and security (running app in iso)
User Level (U) CSR
- lowest privilege can be accessed by user mode
- instructions safe to expose to apps
- e.g., performance counters
Hypervisor (H) CSR
- optional level in new RISC-V
- for virtualization (VMs)
- between M and S modes
Supervisor (S) CSR
- system level for OS
- can control resources but lower level that machine
Machine (M) CSR
- most privileged, machine-level
- controls low level stuff like interrupts, exception handling, and physical memory access
- used at boot
- usually shared
Switching/Usage
- CPU starts at machine level
- S/W can use CSR
mstatus
to lower privilege level -
mret,sret,uret
used to return to privilege levels - CPU excepts if wrong privilege envoked
Common CSR: Counters
- the counts are stored as 64-bit values, h modifier an be used to access higher 32 bits
- but overflow in one 32 bit value can lead to issues in 2-part reading lower then upper, lower could overflow in the time it takes to read upper, thus solution, check twice
- loop is the address of the cycle count, we choose which to use based on overflow and loop until not overflowed
- loop is the address of the cycle count, we choose which to use based on overflow and loop until not overflowed
RISC-V ABI
-
ra
is the return address that the program jumps to at the end of the function call, e.g. ifmain
callsfoo
,ra
stores the address ofmain
-
r0==a0
is the function return value storage - r0 == a0 below
Stack Pointer
- Stack moves from high address down (i.e. we decrement to increase stack space)
- stack moves down, heap moves up (dynamic data)
- stack moves down, heap moves up (dynamic data)
- the CPU reads the predetermined stack base address from the keyword:
STACK_BASE_ADDR
Global pointer
- starts at the middle of the predetermined (at compile time) static frame space bc the prog knows the amount of static data required
- the gp can then increment or decrease bc it begins in the middle of the static (global) space
- the linker exports the pointer location address as the keyword
__global_pointer
Thread pointer
- the TLS (thread local storage) tracks thread local variables
- so we need a thread pointer to point to the values in the thread local storage (allocated on the heap) - there is only 1 TLS for the program and each thread allocates within the TLS based on where the
tp
points to - the linker/compiler knows how much TLS space to allocate at compile time so we just allocate a static TLS frame and use the
tp
to identify the storage location- e.g.
- e.g.
- ON MULTITHREADED APP - register file is NOT shared
- each thread has its own physical register