Should beginners start with RV32 or RV64?
RV32 is usually easier for building the first mental model. Register width, load/store behavior, and immediates are more direct. Add RV64 once you start dealing with pointers, system calls, ABI details, and address width.
When do I need to distinguish lw from ld?
First ask whether you are loading a 32-bit value or an address/pointer. Use lw for 32-bit data; in RV64, ld is the common choice for 64-bit addresses or values. Also note that lw in RV64 sign-extends the loaded 32-bit value to 64 bits, while ld loads the full 64 bits.
Why do we need addw if add already exists?
Because add in RV64 follows XLEN-wide result semantics, while addw intentionally keeps 32-bit word result semantics and then sign-extends the result back to 64 bits. This is crucial for C 32-bit int results or emulating 32-bit program behavior; ordinary 64-bit pointer address arithmetic should use XLEN-wide operations, not addw truncation.
What is the difference between interrupts and exceptions, and how does hardware distinguish them?
Interrupts are asynchronous external events (timer, external device). Exceptions are synchronous, triggered by the currently executing instruction (illegal instruction, misaligned address, page fault, ecall). Hardware distinguishes them via the MSB of mcause/scause: bit[XLEN-1]=1 means interrupt, 0 means exception. Lower bits hold the cause code, e.g. interrupt 7 is Machine Timer Interrupt (MTI), exception 2 is Illegal Instruction.
What exactly does the hardware do after an ecall?
ecall raises an environment-call exception, then trap handling follows the current privilege level, delegation configuration, and target trap mode. Hardware writes the corresponding epc/cause/status state and updates xstatus previous-interrupt-enable and previous-privilege fields. The epc register records the ecall instruction itself, not the next instruction. For M-mode traps check mepc/mcause/mtvec/mstatus; for S-mode traps check sepc/scause/stvec/sstatus.
What is the difference between sret and mret? When is each used?
mret returns from an M-mode trap handler, sret from an S-mode handler. If M-mode handles a trap from U/S directly, use mret. If M-mode delegates the trap to S-mode, the S-mode handler uses sret. mret restores PC from mepc and updates MPP/MPIE as specified; sret does the same using sepc/SPP/SPIE.
Why does RISC-V need dedicated CSR instructions instead of load/store?
CSRs live in a separate 12-bit CSR address space (numbers 0-4095), not in the ordinary memory address space, so load/store cannot reach them. Software must use csrrw, csrrs, csrrc, and their immediate forms; these instructions provide read/write, set, clear, and other atomic read-modify-write semantics.
What do MIE, MPIE, and MPP bits in mstatus do?
MIE (bit 3) controls global machine interrupt enable when the hart is currently running in M-mode; lower-privilege execution also depends on privilege level, delegation, and per-level interrupt-enable rules. On M-mode trap entry, hardware saves MIE into MPIE (bit 7) and clears MIE, while MPP (bits 12:11) records the previous privilege mode; mret restores from these fields. S-mode uses the corresponding SIE/SPIE/SPP fields.
What is the difference between LA and LLA pseudo-instructions?
LA is the general load-address pseudo-instruction, and its expansion depends on code model, PIC/non-PIC mode, relocation, and symbol visibility. LLA expresses link-time-local address calculation and commonly expands to a PC-relative auipc+addi sequence without using the GOT for external symbols. In bare-metal or kernel assembly, choose LA or LLA based on whether the symbol is local and whether PIC/GOT addressing is needed.
Why do I frequently see auipc + addi/jalr patterns in RISC-V code?
A 32-bit RISC-V instruction cannot carry an arbitrary-width address. AUIPC adds the 20-bit U-immediate shifted left by 12 to the current PC and writes rd; ADDI can add the low 12 bits for PC-relative data addresses, while JALR can form PC-relative call/jump targets. Absolute addresses, GOT, and PIC expansions depend on code model and relocation rules, so they should not be collapsed into a simple AUIPC+ADDI/JALR rule.
What is the difference between LR/SC and AMO atomics? Why have both?
AMO (e.g. amoadd, amoswap) is a single-instruction atomic read-modify-write — fixed operations but simple and reliable. LR/SC is a split approach: LR reads memory and sets a reservation, SC writes only if the reservation is still held, otherwise fails. LR/SC is more flexible — you can do arbitrary computation between LR and SC, enabling complex primitives like compare-and-swap — but it may spuriously fail and requires a retry loop. The two mechanisms complement each other: AMO for simple arithmetic atomics, LR/SC for general lock-free data structures.