riscv instruction

riscv instruction helper

I (integer)

lui

LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros.

1
lui rd, immediate           # x[rd] = sext(immediate[31:12] << 12)
1
lui t0, 1					# t0 = 1 << 12 = 0x1000

auipc

AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.

1
auipc rd, immediate                  #x[rd] = pc + sext(immediate[31:12] << 12)
1
0x80000028: auipc t0, 1              # 0x80000028 + 1<<12 = 0x80001028

jal

The jump and link (JAL) instruction uses the J-type format, where the J-immediate encodes a signed offset in multiples of 2 bytes. The offset is sign-extended and added to the address of the jump instruction to form the jump target address. Jumps can therefore target a ±1 MiB range. JAL stores the address of the instruction following the jump (pc+4) into register rd. The standard software calling convention uses x1 as the return address register and x5 as an alternate link register.

1
jal rd, offset                                 # x[rd] = pc+4; pc += sext(offset)
1
2
0x80000028:      jal ra, 1f                    # ra=0x8000002c, pc = pc +4
0x8000002c: 1: nop

一般用于函数调用,offset会通过symbol来指定,这样编译器可以自己计算偏移地址。

jalr

The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the sign-extended 12-bit I-immediate to the register rs1, then setting the least-significant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.

1
jalr rd, offset(rs1)                  # t =pc+4; pc=(x[rs1]+sext(offset))&~1; x[rd]=t

pc加4给rd;rs1的值加上offset,然后把最低bit置0,给pc。

1
2
0x80000028: auipc t0, 0               # t0=pc=0x80000028
0x8000002c: jalr ra, t0, 9 # ra=pc+4=0x80000030 ; pc=(t0+9)&~1=0x80000030

把最低bit清零的动作是为了让pc对齐,因为可能t0的值不对齐,所以imm也取了[11:0],加了之后可能就对齐了。

也可以省略ra。

1
2
3
0x80000028: auipc t0,0                # t0=pc=0x80000028
0x8000002c: addi t0,t0,0xc # t0=pc=0x80000034
0x80000030: jalr t0 # ra=pc+4; pc=0x80000034
1
jr rs1       # pc = x[rs1]            # jalr x0, 0(rs1)

beq/bne/blt/bltu/bge/bgeu

BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU,
respectively.

1
2
3
4
5
6
beq  rs1, rs2, offset: if (rs1 == rs2) pc += sext(offset)
bne rs1, rs2, offset: if (rs1 != rs2) pc += sext(offset)
blt rs1, rs2, offset: if (rs1 < s rs2) pc += sext(offset)
bltu rs1, rs2, offset: if (rs1 < u rs2) pc += sext(offset)
bge rs1, rs2, offset: if (rs1 ≥ s rs2) pc += sext(offset)
bgeu rs1, rs2, offset: if (rs1 ≥ s rs2) pc += sext(offset)
1
2
3
4
5
6
7
8
9
10
beqz rs1, offset      :   if (rs1 == 0) pc += sext(offset) :  beq rs1, x0, offset
bgez rs1, offset : if (rs1 ≥s 0) pc += sext(offset) : bge rs1, x0, offset
bgt rs1, rs2, offset : if (rs1 >s rs2) pc += sext(offset) : blt rs2, rs1, offset
bgtu rs1, rs2, offset : if (rs1 >u rs2) pc += sext(offset) : bltu rs2, rs1, offset
bgtz rs1, offset : if (rs2 >s 0) pc += sext(offset) : blt x0, rs2, offset
ble rs1, rs2, offset : if (rs1 ≤s rs2) pc += sext(offset) : bge rs2, rs1, offset
bleu rs1, rs2, offset : if (rs1 ≤u rs2) pc += sext(offset) : bgeu rs2, rs1, offset
blez rs2, offset : if (rs2 ≤s 0) pc += sext(offset) : bge x0, rs2, offset
bltz rs2, offset : if (rs1 <s 0) pc += sext(offset) : blt rs1, x0, offset
bnez rs1, offset : if (rs1 != 0) pc += sext(offset) : bne rs1, x0, offset

lb/lh/lw/lbu/lhu/lwu/ld/ sb/sh/sw/sd

lb字节加载。从地址x[rs1]+sign-extend(offset)读取一个字节,经符号位扩展写入x[rd]。

lbu无符号字节加载。从地址x[rs1]+sign-extend(offset)读取一个字节,经零扩展写入x[rd]。

1
2
3
4
5
6
7
lb  rd, offset(rs1)    # x[rd] = sext(M[x[rs1] + sext(offset)][7:0])
lbu rd, offset(rs1) # x[rd] = M[x[rs1] + sext(offset)][7:0]
ld rd, offset(rs1) # x[rd] = M[x[rs1] + sext(offset)][63:0]
lh rd, offset(rs1) # x[rd] = sext(M[x[rs1] + sext(offset)][15:0])
lhu rd, offset(rs1) # x[rd] = M[x[rs1] + sext(offset)][15:0]
lw rd, offset(rs1) # x[rd] = sext(M[x[rs1] + sext(offset)][31:0])
lwu rd, offset(rs1) # x[rd] = M[x[rs1] + sext(offset)][31:0]
1
2
3
4
sb rs2, offset(rs1)    # M[x[rs1] + sext(offset) = x[rs2][7: 0]
sd rs2, offset(rs1) # M[x[rs1] + sext(offset) = x[rs2][63: 0]
sh rs2, offset(rs1) # M[x[rs1] + sext(offset) = x[rs2][15: 0]
sw rs2, offset(rs1) # M[x[rs1] + sext(offset) = x[rs2][31: 0]

addi/slti/sltiu/xori/ori/andi/ addiw

SLTI (set less than immediate) 比较 x[rs1]和有符号扩展的 immediate,如果 x[rs1]更小,向 x[rd]写入 1,否则写入 0。

addiw 把符号位扩展的立即数加到 x[rs1],将结果截断为 32 位,把符号位扩展的结果写入 x[rd]。
忽略算术溢出

1
2
3
4
5
6
7
addi rd, rs1, immediate      # x[rd] = x[rs1] + sext(immediate)
slti rd, rs1, immediate # x[rd] = (x[rs1] <𝑠 sext(immediate))
sltiu rd, rs1, immediate # x[rd] = (x[rs1] <𝑢 sext(immediate))
xori rd, rs1, immediate # x[rd] = x[rs1] ^ sext(immediate)
ori rd, rs1, immediate # x[rd] = x[rs1] | sext(immediate)
andi rd, rs1, immediate # x[rd] = x[rs1] & sext(immediate)
addiw rd, rs1, immediate # x[rd] = sext((x[rs1] + sext(immediate))[31:0])

slli/srli/srai/ slliw/srliw/sraiw

SLLI is a logical left shift (zeros are shifted into the lower bits)

slli 把寄存器x[rs1]左移shamt位,空出的位置填入0,结果写入x[rd]。对于RV32I,仅当shamt[5]=0 时,指令才是有效的

SRLI is a logical right shift (zeros are shifted into the upper bits);

SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).

srai 把寄存器 x[rs1]右移 shamt 位,空位用 x[rs1]的最高位填充,结果写入 x[rd]。对于 RV32I,仅当 shamt[5]=0 时指令有效。

SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and produce signed 32-bit results. SLLIW, SRLIW, and SRAIW encodings with imm[5] ̸= 0 are reserved.

slliw 把寄存器 x[rs1]左移 shamt 位,空出的位置填入 0,结果截为 32 位,进行有符号扩展后写入 x[rd]。仅当 shamt[5]=0 时,指令才是有效的。

1
2
3
4
5
6
slli rd, rs1, shamt       # x[rd] = x[rs1] ≪ shamt
slliw rd, rs1, shamt # x[rd] = sext((x[rs1] ≪ shamt)[31: 0])
srai rd, rs1, shamt # x[rd] = (x[rs1] ≫𝑠 shamt)
sraiw rd, rs1, shamt # x[rd] = sext(x[rs1][31: 0] ≫𝑠 shamt)
srli rd, rs1, shamt # x[rd] = (x[rs1] ≫𝑢 shamt)
srliw rd, rs1, shamt # x[rd] = sext(x[rs1][31: 0] ≫𝑢 shamt)
1
2
li a0, 1                   # a0 = 1
slli a0, a0, 0x1f # a0 = a0 << 31 = 0x80000000

add/sub/sll/slt/sltu/xor/srl/sra/or/and

addw/subw/sllw/srlw/sraw

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add rd, rs1, rs2      # x[rd] = x[rs1] + x[rs2]
and rd, rs1, rs2 # x[rd] = x[rs1] & x[rs2]
sub rd, rs1, rs2 # x[rd] = x[rs1] − x[rs2]
sll rd, rs1, rs2 # x[rd] = x[rs1] ≪ x[rs2]
slt rd, rs1, rs2 # x[rd] = (x[rs1] <𝑠 x[rs2])
sltu rd, rs1, rs2 # x[rd] = (x[rs1] <𝑢 x[rs2])
xor rd, rs1, rs2 # x[rd] = x[rs1] ^ x[rs2]
srl rd, rs1, rs2 # x[rd] = (x[rs1] ≫𝑢 x[rs2])
sra rd, rs1, rs2 # x[rd] = (x[rs1] ≫𝑠 x[rs2])
or rd, rs1, rs2 # x[rd] = x[rs1] | 𝑥[𝑟𝑠2]

addw rd, rs1, rs2 # x[rd] = sext((x[rs1] + x[rs2])[31:0])
subw rd, rs1, rs2 # x[rd] = sext((x[rs1] − x[rs2])[31: 0])
sllw rd, rs1, rs2 # x[rd] = sext((x[rs1] ≪ x[rs2][4: 0])[31: 0])
srlw rd, rs1, rs2 # x[rd] = sext(x[rs1][31: 0] ≫𝑢 x[rs2][4: 0])
sraw rd, rs1, rs2 # x[rd] = sext(x[rs1][31: 0] ≫𝑠 x[rs2][4: 0])

ecall/ebreak

ebreak 通过抛出断点异常的方式请求调试器

ecall 通过引发环境调用异常来请求执行环境。

1
2
Ebreak      # RaiseException(Breakpoint)
ecall # RaiseException(EnvironmentCall)

fence

1
fence.i                             # Fence(Store, Fetch)

同步指令流(Fence Instruction Stream). I-type, RV32I and RV64I. 使对内存指令区域的读写,对后续取指令可见

csr

CSRRW (Control and Status Register Read and Write) 记控制状态寄存器 csr 中的值为 t。 把寄存器 x[rs1]的值写入 csr,再把 t 写入 x[rd]。

CSRRS (Control and Status Register Read and Set). 记控制状态寄存器 csr 中的值为 t。 把 t 和寄存器 x[rs1]按位或的结果写入 csr,再把 t 写入x[rd]。

CSRRC (Control and Status Register Read and Clear ) 记控制状态寄存器 csr 中的值为 t。 把 t 和寄存器 x[rs1]按位与的结果写入 csr,再把 t 写入x[rd]

1
2
3
4
5
6
csrrw rd, csr, rs1                # t = CSRs[csr]; CSRs[csr] = x[rs1]; x[rd] = t
csrrs rd, csr, rs1 # t = CSRs[csr]; CSRs[csr] = t | x[rs1]; x[rd] = t
csrrc rd, csr, rs1 # t = CSRs[csr]; CSRs[csr] = t &~x[rs1]; x[rd] = t
csrrwi rd, csr, zimm[4:0] # x[rd] = CSRs[csr]; CSRs[csr] = zimm
csrrsI rd, csr, rs1 # t = CSRs[csr]; CSRs[csr] = t | ~zimm; x[rd] = t
csrrci rd, csr, zimm[4:0] # t = CSRs[csr]; CSRs[csr] = t &~zimm; x[rd] = t

M

pseudo-instruction

symbol

pseudo-instruction base instruction meaning
la rd, symbol (non-PIC) auipc rd, delta[31:12] + delta[11] load absolute address
addi rd, rd, delta[11:0] where delta = symbol - pc
la rd, symbol (PIC) auipc rd, delta[31:12] + delta[11] load absolute address
l{w/d} rd, rd, delta[11:0] where delta = GOT[symbol] - pc

Example:

1
2
3
4
5
    la t0, 1f;
csrw mtvec, t0;
csrwi sptbr, 0;
.align 2;
1:

=>

1
2
3
4
5
80000054: auipc t0, 0x0
80000058: addi t0, t0, 16 # 80000064
8000005c: csrw mtvec, t0
80000060: csrwi satp, 0
80000064:

li

在 RV32I 中,它等同于执行 lui 和/或 addi; 对于 RV64I, 会扩展为这种指令序列 lui, addi, slli, addi, slli, addi ,slli, addi。

pseudo-instruction base instruction meaning
li rd, immediate Myriad sequences Load immediate

Example

1
li a0, 0x00001800

=>

1
2
lui a0, 0x2
addiw a0, a0, -2048 # 2<<12 - 2048 = 0x1800

LUI (load upper immediate) uses the same opcode as RV32I. LUI places the 20-bit U-immediate
into bits 31–12 of register rd and places zero in the lowest 12 bits. The 32-bit result is sign-extended
to 64 bits.

1
li t6, 8

=>

1
[00800f93] addi t6,x0,8

csr

pseudo-instruction base instruction meaning
rdinstret [h] rd csrrs rd, instret [h], x0 read instruction-retired counter
rdcycle [h] rd csrrs rd, cycle [h], x0 read cycle counter
rdtime[h] rd csrrs rd, time[h], x0 read real-time clock
csrr rd, csr csrrs rd, csr, x0 read csr
csrw csr, rs csrrw x0, csr, rs write csr
csrs csr, rs csrrs x0, csr, rs set bits in csr
csrc csr, rs csrrc x0, csr, rs clear bits in csr
csrwi csr, imm csrrwi x0, csr, imm write csr, immediate
csrsi csr, imm csrrsi x0, csr, imm set bits in csr, immediate
csrci csr, imm csrrci x0, csr, imm clear bits in csr, immediate
frcsr rd csrrs rd, fcsr, x0 read FP control/status register
fscsr rd, rs csrrw rd, fcsr, rs swap FP control/status register
fscsr rs csrrw x0, fcsr, rs write FP contrl/ status register
frrm rd csrrs rd, frm, x0 read FP rounding mode
fsrm rd, rs csrrw rd, frm, rs swap FP rounding mode
fsrm rs csrrw x0, frm, rs write FP rounding mode
frflags rd csrrs rd, fflags, x0 read FP exception flags
fsflags rd, rs csrrw rd, fflags, rs swap FP exception flags
fsflags rs csrrw x0, fflags, rs write FP exception flags