CPU Specifications
CPU
CPU Registers
CPU Opcode Encoding
CPU Load/Store Opcodes
CPU ALU Opcodes
CPU Jump Opcodes
CPU Coprocessor Opcodes
CPU Pseudo Opcodes
System Control Coprocessor (COP0)
COP0 - Register Summary
COP0 - Exception Handling
COP0 - Misc
COP0 - Debug Registers
CPU Registers
All registers are 32bit wide.
Name Alias Common Usage
R0 zero Constant (always 0)
R1 at Assembler temporary (destroyed by some assembler pseudoinstructions!)
R2-R3 v0-v1 Subroutine return values, may be changed by subroutines
R4-R7 a0-a3 Subroutine arguments, may be changed by subroutines
R8-R15 t0-t7 Temporaries, may be changed by subroutines
R16-R23 s0-s7 Static variables, must be saved by subs
R24-R25 t8-t9 Temporaries, may be changed by subroutines
R26-R27 k0-k1 Reserved for kernel (destroyed by some IRQ handlers!)
R28 gp Global pointer (rarely used)
R29 sp Stack pointer
R30 fp(s8) Frame Pointer, or 9th Static variable, must be saved
R31 ra Return address (used so by JAL,BLTZAL,BGEZAL opcodes)
- pc Program counter
- hi,lo Multiply/divide results, may be changed by subroutines
R31 can be used as general purpose register, however, some opcodes are using it to store the return address: JAL, BLTZAL, BGEZAL. (Note: JALR can optionally store the return address in R31, or in R1..R30. Exceptions store the return address in cop0r14 - EPC).
R29 (SP) - Full Decrementing Wasted Stack Pointer
The CPU doesn't explicitly have stack-related registers or opcodes, however,
conventionally, R29 is used as stack pointer (SP). The stack can be accessed
with normal load/store opcodes, which do not automatically increase/decrease
SP, so the SP register must be manually modified to (de-)allocate data.
The PSX BIOS is using "Full Decrementing Wasted Stack".
Decrementing means that SP gets decremented when allocating data (that's common
for most CPUs) - Full means that SP points to the first ALLOCATED word on the
stack, so the allocated memory is at SP+0 and above, free memory at SP-1 and
below, Wasted means that when calling a sub-function with N parameters, then
the caller must pre-allocate N works on stack, and the sub-function may freely
use and destroy these words; at [SP+0..N*4-1].
For example, "push ra,r16,r17" would be implemented as:
sub sp,20h
mov [sp+14h],ra
mov [sp+18h],r16
mov [sp+1Ch],r17
[sp+00h..0Fh] wasted stack (may, or may not, be used by sub-functions)
[sp+10h..13h] 8-byte alignment padding (not used)
[sp+14h..1Fh] pushed registers
CPU Opcode Encoding
Primary opcode field (Bit 26..31)
00h=SPECIAL 08h=ADDI 10h=COP0 18h=N/A 20h=LB 28h=SB 30h=LWC0 38h=SWC0
01h=BcondZ 09h=ADDIU 11h=COP1 19h=N/A 21h=LH 29h=SH 31h=LWC1 39h=SWC1
02h=J 0Ah=SLTI 12h=COP2 1Ah=N/A 22h=LWL 2Ah=SWL 32h=LWC2 3Ah=SWC2
03h=JAL 0Bh=SLTIU 13h=COP3 1Bh=N/A 23h=LW 2Bh=SW 33h=LWC3 3Bh=SWC3
04h=BEQ 0Ch=ANDI 14h=N/A 1Ch=N/A 24h=LBU 2Ch=N/A 34h=N/A 3Ch=N/A
05h=BNE 0Dh=ORI 15h=N/A 1Dh=N/A 25h=LHU 2Dh=N/A 35h=N/A 3Dh=N/A
06h=BLEZ 0Eh=XORI 16h=N/A 1Eh=N/A 26h=LWR 2Eh=SWR 36h=N/A 3Eh=N/A
07h=BGTZ 0Fh=LUI 17h=N/A 1Fh=N/A 27h=N/A 2Fh=N/A 37h=N/A 3Fh=N/A
Secondary opcode field (Bit 0..5) (when Primary opcode = 00h)
00h=SLL 08h=JR 10h=MFHI 18h=MULT 20h=ADD 28h=N/A 30h=N/A 38h=N/A
01h=N/A 09h=JALR 11h=MTHI 19h=MULTU 21h=ADDU 29h=N/A 31h=N/A 39h=N/A
02h=SRL 0Ah=N/A 12h=MFLO 1Ah=DIV 22h=SUB 2Ah=SLT 32h=N/A 3Ah=N/A
03h=SRA 0Bh=N/A 13h=MTLO 1Bh=DIVU 23h=SUBU 2Bh=SLTU 33h=N/A 3Bh=N/A
04h=SLLV 0Ch=SYSCALL 14h=N/A 1Ch=N/A 24h=AND 2Ch=N/A 34h=N/A 3Ch=N/A
05h=N/A 0Dh=BREAK 15h=N/A 1Dh=N/A 25h=OR 2Dh=N/A 35h=N/A 3Dh=N/A
06h=SRLV 0Eh=N/A 16h=N/A 1Eh=N/A 26h=XOR 2Eh=N/A 36h=N/A 3Eh=N/A
07h=SRAV 0Fh=N/A 17h=N/A 1Fh=N/A 27h=NOR 2Fh=N/A 37h=N/A 3Fh=N/A
Opcode/Parameter Encoding
31..26 |25..21|20..16|15..11|10..6 | 5..0 |
6bit | 5bit | 5bit | 5bit | 5bit | 6bit |
-------+------+------+------+------+--------+------------
000000 | N/A | rt | rd | imm5 | 0000xx | shift-imm
000000 | rs | rt | rd | N/A | 0001xx | shift-reg
000000 | rs | N/A | N/A | N/A | 001000 | jr
000000 | rs | N/A | rd | N/A | 001001 | jalr
000000 | <-----comment20bit------> | 00110x | sys/brk
000000 | N/A | N/A | rd | N/A | 0100x0 | mfhi/mflo
000000 | rs | N/A | N/A | N/A | 0100x1 | mthi/mtlo
000000 | rs | rt | N/A | N/A | 0110xx | mul/div
000000 | rs | rt | rd | N/A | 10xxxx | alu-reg
000001 | rs | 00000| <--immediate16bit--> | bltz
000001 | rs | 00001| <--immediate16bit--> | bgez
000001 | rs | 10000| <--immediate16bit--> | bltzal
000001 | rs | 10001| <--immediate16bit--> | bgezal
00001x | <---------immediate26bit---------> | j/jal
00010x | rs | rt | <--immediate16bit--> | beq/bne
00011x | rs | N/A | <--immediate16bit--> | blez/bgtz
001xxx | rs | rt | <--immediate16bit--> | alu-imm
001111 | N/A | rt | <--immediate16bit--> | lui-imm
100xxx | rs | rt | <--immediate16bit--> | load rt,[rs+imm]
101xxx | rs | rt | <--immediate16bit--> | store rt,[rs+imm]
x1xxxx | <------coprocessor specific------> | coprocessor (see below)
Coprocessor Opcode/Parameter Encoding
31..26 |25..21|20..16|15..11|10..6 | 5..0 |
6bit | 5bit | 5bit | 5bit | 5bit | 6bit |
-------+------+------+------+------+--------+------------
0100nn |0|0000| rt | rd | N/A | 000000 | MFCn rt,rd_dat ;rt = dat
0100nn |0|0010| rt | rd | N/A | 000000 | CFCn rt,rd_cnt ;rt = cnt
0100nn |0|0100| rt | rd | N/A | 000000 | MTCn rt,rd_dat ;dat = rt
0100nn |0|0110| rt | rd | N/A | 000000 | CTCn rt,rd_cnt ;cnt = rt
0100nn |0|1000|00000 | <--immediate16bit--> | BCnF target ;jump if false
0100nn |0|1000|00001 | <--immediate16bit--> | BCnT target ;jump if true
0100nn |1| <--------immediate25bit--------> | COPn imm25
010000 |1|0000| N/A | N/A | N/A | 000001 | COP0 01h ;=TLBR, unused on PS1
010000 |1|0000| N/A | N/A | N/A | 000010 | COP0 02h ;=TLBWI, unused on PS1
010000 |1|0000| N/A | N/A | N/A | 000110 | COP0 06h ;=TLBWR, unused on PS1
010000 |1|0000| N/A | N/A | N/A | 001000 | COP0 08h ;=TLBP, unused on PS1
010000 |1|0000| N/A | N/A | N/A | 010000 | COP0 10h ;=RFE
1100nn | rs | rt | <--immediate16bit--> | LWCn rt_dat,[rs+imm]
1110nn | rs | rt | <--immediate16bit--> | SWCn rt_dat,[rs+imm]
Illegal Opcodes
All opcodes that are marked as "N/A" in the Primary and Secondary opcode tables
are causing a Reserved Instruction Exception (excode=0Ah).
The unused operand bits (eg. Bit21-25 for LUI opcode) should be usually zero,
but do not necessarily trigger exceptions if set to nonzero values.
CPU Load/Store Opcodes
Load instructions
lb rt,imm(rs) rt=[imm+rs] ;byte sign-extended
lbu rt,imm(rs) rt=[imm+rs] ;byte zero-extended
lh rt,imm(rs) rt=[imm+rs] ;halfword sign-extended
lhu rt,imm(rs) rt=[imm+rs] ;halfword zero-extended
lw rt,imm(rs) rt=[imm+rs] ;word
Load and store instructions can generate address error exceptions if the memory address is not properly aligned (To a halfword boundary for lh/lhu/sh or a word boundary for lw/sw. lwl/lwr/swl/swr can't access misaligned address as they force align the memory address). Additionally, accessing certain invalid memory locations will cause a bus error exception. If an exception occurs during a load instruction, the rt register is left untouched.
Caution - Load Delay
The loaded data is NOT available to the next opcode, ie. the target register
isn't updated until the next opcode has completed. So, if the next opcode tries
to read from the load destination register, then it would (usually) receive the
OLD value of that register (unless an IRQ occurs between the load and next
opcode, in that case the load would complete during IRQ handling, and so, the
next opcode would receive the NEW value).
MFC2/CFC2 also have a 1-instruction delay until the target register is loaded with its new value (more info in the GTE section).
Store instructions
sb rt,imm(rs) [imm+rs]=(rt AND FFh) ;store 8bit
sh rt,imm(rs) [imm+rs]=(rt AND FFFFh) ;store 16bit
sw rt,imm(rs) [imm+rs]=rt ;store 32bit
Caution - 8/16-bit writes to certain IO registers
During an 8-bit or 16-bit store, all 32 bits of the GPR are placed on the bus.
As such, when writing to certain 32-bit IO registers with an 8 or 16-bit store, it will behave like a 32-bit store, using the register's full value.
The soundscope on some shells is known to rely on this, as it uses sh
to write to certain DMA registers.
If this is not properly emulated, the soundscope will hang, waiting for an interrupt that will never be fired.
Load/Store Alignment
Halfword addresses must be aligned by 2, word addresses must be aligned by 4,
trying to access mis-aligned addresses will cause an exception. There's no
alignment restriction for bytes.
Unaligned Load/Store
lwr rt,imm(rs) load right bits of rt from memory (usually imm+0)
lwl rt,imm(rs) load left bits of rt from memory (usually imm+3)
swr rt,imm(rs) store right bits of rt to memory (usually imm+0)
swl rt,imm(rs) store left bits of rt to memory (usually imm+3)
lwl r2,$0003(t0) ;\no delay required between these
lwr r2,$0000(t0) ;/(although both access r2)
nop ;-requires load delay HERE (before reading from r2)
and r2,r2,0ffffh ;-access r2 (eg. reducing it to unaligned 16bit data)
Unaligned Load/Store (Details)
LWR/SWR transfers the right (=lower) bits of Rt, up-to 32bit memory boundary:
lwr/swr [N*4+0] transfer whole 32bit of Rt to/from [N*4+0..3]
lwr/swr [N*4+1] transfer lower 24bit of Rt to/from [N*4+1..3]
lwr/swr [N*4+2] transfer lower 16bit of Rt to/from [N*4+2..3]
lwr/swr [N*4+3] transfer lower 8bit of Rt to/from [N*4+3]
lwl/swl [N*4+0] transfer upper 8bit of Rt to/from [N*4+0]
lwl/swl [N*4+1] transfer upper 16bit of Rt to/from [N*4+0..1]
lwl/swl [N*4+2] transfer upper 24bit of Rt to/from [N*4+0..2]
lwl/swl [N*4+3] transfer whole 32bit of Rt to/from [N*4+0..3]
Note: The aligned variant can also misused for blocking memory access on
aligned addresses (in that case, if the address is known to be aligned, only
one of the opcodes are needed, either LWL or LWR).... Uhhhhhhhm, OR is that NOT
allowed... more PROBABLY that doesn't work?
CPU ALU Opcodes
arithmetic instructions
add rd,rs,rt rd=rs+rt (with overflow trap)
addu rd,rs,rt rd=rs+rt
sub rd,rs,rt rd=rs-rt (with overflow trap)
subu rd,rs,rt rd=rs-rt
addi rt,rs,imm rt=rs+(-8000h..+7FFFh) (with ov.trap)
addiu rt,rs,imm rt=rs+(-8000h..+7FFFh)
comparison instructions
slt rd,rs,rt if rs<rt (signed comparison) then rd=1 else rd=0
sltu rd,rs,rt if rs<rt (unsigned comparison) then rd=1 else rd=0
slti rt,rs,imm if rs<(sign-extended immediate in range [-8000h..+7FFFh], signed comparison) then rt=1 else rt=0
sltiu rt,rs,imm if rs<(sign-extended immediate in range [0..7FFFh] U [FFFF8000h..FFFFFFFFh], unsigned comparison) then rt=1 else rt=0
logical instructions
and rd,rs,rt rd = rs AND rt
or rd,rs,rt rd = rs OR rt
xor rd,rs,rt rd = rs XOR rt
nor rd,rs,rt rd = FFFFFFFFh XOR (rs OR rt)
andi rt,rs,imm rt = rs AND (0000h..FFFFh)
ori rt,rs,imm rt = rs OR (0000h..FFFFh)
xori rt,rs,imm rt = rs XOR (0000h..FFFFh)
shifting instructions
sllv rd,rt,rs rd = rt SHL (rs AND 1Fh)
srlv rd,rt,rs rd = rt SHR (rs AND 1Fh)
srav rd,rt,rs rd = rt SAR (rs AND 1Fh)
sll rd,rt,imm rd = rt SHL (00h..1Fh)
srl rd,rt,imm rd = rt SHR (00h..1Fh)
sra rd,rt,imm rd = rt SAR (00h..1Fh)
lui rt,imm rt = (0000h..FFFFh) SHL 16
The hardware does NOT generate exceptions on SHL overflows.
Multiply/divide
mult rs,rt hi:lo = rs*rt (signed)
multu rs,rt hi:lo = rs*rt (unsigned)
div rs,rt lo = rs/rt, hi=rs mod rt (signed)
divu rs,rt lo = rs/rt, hi=rs mod rt (unsigned)
mfhi rd rd=hi ;move from hi
mflo rd rd=lo ;move from lo
mthi rs hi=rs ;move to hi
mtlo rs lo=rs ;move to lo
__multu_execution_time_____________________________________________________
Fast (6 cycles) rs = 00000000h..000007FFh
Med (9 cycles) rs = 00000800h..000FFFFFh
Slow (13 cycles) rs = 00100000h..FFFFFFFFh
__mult_execution_time_____________________________________________________
Fast (6 cycles) rs = 00000000h..000007FFh, or rs = FFFFF800h..FFFFFFFFh
Med (9 cycles) rs = 00000800h..000FFFFFh, or rs = FFF00000h..FFFFF801h
Slow (13 cycles) rs = 00100000h..7FFFFFFFh, or rs = 80000000h..FFF00001h
__divu/div_execution_time________________________________________________
Fixed (36 cycles) no matter of rs and rt values
The hardware does NOT generate exceptions on divide overflows, instead, divide errors are returning the following values:
Opcode Rs Rt Hi/Remainder Lo/Result
divu 0..FFFFFFFFh 0 --> Rs FFFFFFFFh
div 0..+7FFFFFFFh 0 --> Rs -1
div -80000000h..-1 0 --> Rs +1
div -80000000h -1 --> 0 -80000000h
Note: After accessing the lo/hi registers, there seems to be a strange rule that one should not touch the lo/hi registers in the next 2 cycles or so... not yet understood if/when/how that rule applies...?
CPU Jump Opcodes
jumps and branches
Note that the instruction following the branch will always be executed.
j dest pc=(pc and F0000000h)+(imm26bit*4)
jal dest pc=(pc and F0000000h)+(imm26bit*4),ra=$+8
jr rs pc=rs
jalr (rd,)rs(,rd) pc=rs, rd=$+8 ;see caution
beq rs,rt,dest if rs=rt then pc=$+4+(-8000h..+7FFFh)*4
bne rs,rt,dest if rs<>rt then pc=$+4+(-8000h..+7FFFh)*4
bltz rs,dest if rs<0 then pc=$+4+(-8000h..+7FFFh)*4
bgez rs,dest if rs>=0 then pc=$+4+(-8000h..+7FFFh)*4
bgtz rs,dest if rs>0 then pc=$+4+(-8000h..+7FFFh)*4
blez rs,dest if rs<=0 then pc=$+4+(-8000h..+7FFFh)*4
bltzal rs,dest if rs<0 then pc=$+4+(..)*4; ra=$+8;
bgezal rs,dest if rs>=0 then pc=$+4+(..)*4; ra=$+8;
jr/jalr can be used to jump to an unaligned address, in which case an address error (AdEL) exception will be raised on the next instruction fetch.
Additionally, bltzal/bgezal will always place the return address in $ra, whether or not the branch is taken. Additionally, if rs
is $ra, then the value used for the comparison is $ra's value before linking.
JALR cautions
Caution: The JALR source code syntax varies (IDT79R3041 specs say "jalr rs,rd",
but MIPS32 specs say "jalr rd,rs"). Moreover, JALR may not use the same
register for both operands (eg. "jalr r31,r31") (doing so would destroy the
target address; which is normally no problem, but it can be a problem if an IRQ
occurs between the JALR opcode and the following branch delay opcode; in that
case BD gets set, and EPC points "back" to the JALR opcode, so JALR is executed
twice, with destroyed target address in second execution).
exception opcodes
Unlike for jump/branch opcodes, exception opcodes are immediately executed (ie.
without executing the following opcode).
syscall imm20 generates a system call exception
break imm20 generates a breakpoint exception
CPU Coprocessor Opcodes
Coprocessor Instructions (COP0..COP3)
mfc# rt,rd ;rt = cop#datRd ;data regs
cfc# rt,rd ;rt = cop#cntRd ;control regs
mtc# rt,rd ;cop#datRd = rt ;data regs
ctc# rt,rd ;cop#cntRd = rt ;control regs
cop# imm25 ;exec cop# command 0..1FFFFFFh
lwc# rt,imm(rs) ;cop#dat_rt = [rs+imm] ;word
swc# rt,imm(rs) ;[rs+imm] = cop#dat_rt ;word
bc#f dest ;if cop#flg=false then pc=$+disp
bc#t dest ;if cop#flg=true then pc=$+disp
rfe ;return from exception (COP0)
tlb<xx> ;virtual memory related (COP0), unused in the PS1
Caution - Load Delay
When reading from a coprocessor register, the next opcode cannot use the
destination register as operand (much the same as the Load Delays that occur
when reading from memory; see there for details).
Reportedly, the Load Delay applies for the next TWO opcodes after coprocessor
reads, but, that seems to be nonsense (the PSX does finish both COP0 and COP2
reads after ONE opcode).
Caution - Store Delay
In some cases, a similar delay occurs when writing to a coprocessor register.
COP0 is more or less free of store delays (eg. one can read from a cop0
register immediately after writing to it), the only known exception is the cop2
enable bit in cop0r12.bit30 (setting that cop0 bit acts delayed, and cop2 isn't
actually enabled until after 2 clock cycles or so).
Writing to cop2 registers has a delay of 2..3 clock cycles. In most cases, that
is probably (?) only 2 cycles, but special cases like writing to IRGB (which
does additionally affect IR1,IR2,IR3) take 3 cycles until the result arrives in
all registers).
Note that Store Delays are counted in numbers of clock cycles (not in numbers
of opcodes). For 3 cycle delay, one must usually insert 3 cached opcodes (or
one uncached opcode).
CPU Pseudo Opcodes
Pseudo instructions (native/spasm)
nop ;alias for sll r0,r0,0
move rd,rs ;alias for addu rd,rs,r0
la rx,imm32 ;load address (alias for lui rx / addiu rx)
li rx,imm32 ;load immediate (alias for lui rx / ori rx)
li rx,imm16 ;load immediate (alias for ori, range 0..FFFFh)
li rx,-imm15 ;load immediate (alias for addiu, range -1..-8000h)
li rx,imm16*10000h ;load immediate (alias for lui)
lw rx,imm32 ;load from address (lui rx / lw rx,rx)
sw rx,imm32 ;store to address (lui r1 / sw rx,r1) (destroys r1!)
lb,lh,lwl,lwr,lbu,lhu;as above pseudo lw
sb,sh,swl,swr ;as above pseudo sw (ie. also destroys r1!)
alu rx,op ;alias for alu rx,rx,op
alu(u) rx,rx,imm ;alias for alui(u) rx,rx,imm
jalr rx ;alias for jalr (RA,)rx(,RA)
subi(u) rt,rs,imm ;alias for addi(u) rt,rs,-imm
beqz rx,dest ;alias for beq rx,r0,dest
bnez rx,dest ;alias for bne rx,r0,dest
b dest ;alias for beq r0,r0,dest (jump relative/spasm)
bra dest ;alias for bgez r0, r0, dest
bal dest ;alias for bgezal r0, r0, dest
Pseudo instructions (nocash/a22i, not present on most other assemblers)
mov rx,NNNN0000h ;alias for lui rx,NNNNh
mov rx,0000NNNNh ;alias for or rx,r0,NNNNh ;max +FFFFh
mov rx,-imm15 ;alias for add rx,r0,-NNNNh ;min -8000h
mov rx,ry ;alias for or rx,ry,0 (or "addiu")
jrel dest ;alias for blez R0,dest ;relative jump
crel dest ;alias for callns R0,dest ;relative call
jz rx,dest ;alias for je rx,R0,dest
jnz rx,dest ;alias for jne rx,R0,dest
call rx ;alias for call rx,ret=RA
ret ;alias for jmp ra
subt rt,rs,imm ;alias for addt rt,rs,-imm
sub rt,rs,imm ;alias for add rt,rs,-imm
alu rx,op ;alias for alu rx,rx,op
neg(t) rx,ry ;alias for sub(t) rx,R0,ry
not rx,ry ;alias for nor rx,R0,ry
neg(t)/not rx ;alias for neg(t)/not rx,rx
setz rx,ry ;alias for setb rx,ry,1 (set if zero)
setnz rx,ry ;alias for setb rx,R0,ry (set if nonzero)
syscall/break ;alias for syscall/break 000000h
movp rx,imm32 ;alias for lui rx,imm16 -plus- or rx,rx,imm16)
mov(bhs)p rx,[imm32] ;load from address (lui rx,imm16 / mov rx,[rx+imm16])
movu [rs+imm] ;alias for lwr/swr [rs+imm] plus lwl/swl [rs+imm+3]
reti ;alias for jmp k0 plus rfe
push rlist ;alias for sub sp,n*4 -- mov [sp+(1..n)*4],r1..rn
pop rlist ;alias for mov r1..rn,[sp+(1..n)*4] -- add sp,n*4
pop pc,rlist ;alias for pop ra,rlist -- jmp ra
Directives (nocash)
.mips ;select MIPS instruction set (alternately .hc05 for MC68HC05)
.bios ;create a .ROM file (instead of .EXE)
.auto_nop ;append NOPs to jumps ;unless next opcode starts with a +
org imm ;assume following code to be originated at address "imm"
db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings
dw n(,n(..))) ;define 16bit data values(s) (not 32bit data!)
dd n(,n(..))) ;define 32bit data values(s)
.align imm
0 ;alias for immediate 0 and register R0 (whichever fits)
Directives (native)
org imm ;self-explaining (but, default=$80010000 for spasm!)
align imm ;self-explaining (probably zeropadded?)
db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings
dh n(,n(..))) ;define 16bit data values(s)
dw n(,n(..))) ;define 32bit data values(s) (not 16bit data!)
dcb len,value ;fill <len> bytes by <value> (different as DCB on ARM CPUs)
xyz ;define label "xyz" at current address (without colon)
xyz equ n ;assign value n to xyz
xyz = n ;probably same/sililar as "equ"
;xyz ;comments invoked with semicolon (spasm)
incbin file.bin ;import binary file
include file.asm ;import asm file
zero ;alias for r0
>imm32 ;alias for (i-(i AND 8000h))/10000h, and/or i/10000h ?
<imm32 ;alias for (i AND 0FFFFh), used for SW(+/-) and ORI(+)?
end ;N/A ;no "end" or ".end" directive needed/used by spasm
r1 aka at ;N/A ;some assemblers may (optionally) reject to use r1/at
Syntax for unknown assembler (for pad.s)
It uses "0x" for HEX values (but doesn't use "$" for registers).
It uses "#" instead of ";" for comments.
It uses ":" for labels (fortunately).
The assembler has at least one directive: ".byte" (equivalent to "db" on other
assemblers).
I've no clue which assembler is used for that syntax... could that be the Psy-Q
assembler?
COP0 - Register Summary
COP0 Register Summary
cop0r0-r2 - N/A
cop0r3 - BPC - Breakpoint on execute (R/W)
cop0r4 - N/A
cop0r5 - BDA - Breakpoint on data access (R/W)
cop0r6 - JUMPDEST - Randomly memorized jump address (R)
cop0r7 - DCIC - Breakpoint control (R/W)
cop0r8 - BadVaddr - Bad Virtual Address (R)
cop0r9 - BDAM - Data Access breakpoint mask (R/W)
cop0r10 - N/A
cop0r11 - BPCM - Execute breakpoint mask (R/W)
cop0r12 - SR - System status register (R/W)
cop0r13 - CAUSE - Describes the most recently recognised exception (R)
cop0r14 - EPC - Return Address from Trap (R)
cop0r15 - PRID - Processor ID (R)
cop0r16-r31 - Garbage
cop0r32-r63 - N/A - None such (Control regs)
COP0 - Exception Handling
cop0r13 - CAUSE - (Read-only, except, Bit8-9 are R/W)
Describes the most recently recognised exception
0-1 - Not used (zero)
2-6 Excode Describes what kind of exception occured:
00h INT Interrupt
01h MOD Tlb modification (none such in PSX)
02h TLBL Tlb load (none such in PSX)
03h TLBS Tlb store (none such in PSX)
04h AdEL Address error, Data load or Instruction fetch
05h AdES Address error, Data store
The address errors occur when attempting to read
outside of KUseg in user mode and when the address
is misaligned. (See also: BadVaddr register)
06h IBE Bus error on Instruction fetch
07h DBE Bus error on Data load/store
08h Syscall Generated unconditionally by syscall instruction
09h BP Breakpoint - break instruction
0Ah RI Reserved instruction
0Bh CpU Coprocessor unusable
0Ch Ov Arithmetic overflow
0Dh-1Fh Not used
7 - Not used (zero)
8-15 Ip Interrupt pending field. Bit 8 and 9 are R/W, and
contain the last value written to them. As long
as any of the bits are set they will cause an
interrupt if the corresponding bit is set in IM.
16-27 - Not used (zero)
28-29 CE Contains the coprocessor number if the exception
occurred because of a coprocessor instuction for
a coprocessor which wasn't enabled in SR.
30 - Not used (zero)
31 BD Is set when last exception points to the
branch instuction instead of the instruction
in the branch delay slot, where the exception
occurred.
cop0r12 - SR - System status register (R/W)
0 IEc Current Interrupt Enable (0=Disable, 1=Enable) ;rfe pops IUp here
1 KUc Current Kernel/User Mode (0=Kernel, 1=User) ;rfe pops KUp here
2 IEp Previous Interrupt Enable ;rfe pops IUo here
3 KUp Previous Kernel/User Mode ;rfe pops KUo here
4 IEo Old Interrupt Enable ;left unchanged by rfe
5 KUo Old Kernel/User Mode ;left unchanged by rfe
6-7 - Not used (zero)
8-15 Im 8 bit interrupt mask fields. When set the corresponding
interrupts are allowed to cause an exception.
16 Isc Isolate Cache (0=No, 1=Isolate)
When isolated, all load and store operations are targetted
to the Data cache, and never the main memory.
(Used by PSX Kernel, in combination with Port FFFE0130h)
17 Swc Swapped cache mode (0=Normal, 1=Swapped)
Instruction cache will act as Data cache and vice versa.
Use only with Isc to access & invalidate Instr. cache entries.
(Not used by PSX Kernel)
18 PZ When set cache parity bits are written as 0.
19 CM Shows the result of the last load operation with the D-cache
isolated. It gets set if the cache really contained data
for the addressed memory location.
20 PE Cache parity error (Does not cause exception)
21 TS TLB shutdown. Gets set if a programm address simultaneously
matches 2 TLB entries.
(initial value on reset allows to detect extended CPU version?)
22 BEV Boot exception vectors in RAM/ROM (0=RAM/KSEG0, 1=ROM/KSEG1)
23-24 - Not used (zero)
25 RE Reverse endianness (0=Normal endianness, 1=Reverse endianness)
Reverses the byte order in which data is stored in
memory. (lo-hi -> hi-lo)
(Affects only user mode, not kernel mode) (?)
(The bit doesn't exist in PSX ?)
26-27 - Not used (zero)
28 CU0 COP0 Enable (0=Enable only in Kernel Mode, 1=Kernel and User Mode)
29 CU1 COP1 Enable (0=Disable, 1=Enable) (none in PSX)
30 CU2 COP2 Enable (0=Disable, 1=Enable) (GTE in PSX)
31 CU3 COP3 Enable (0=Disable, 1=Enable) (none in PSX)
cop0r14 - EPC - Return Address from Trap (R)
0-31 Return Address from Exception
Interrupts should always return to EPC+0, no matter of the BD flag. That way, if BD=1, the branch gets executed again, that's required because EPC stores only the current program counter, but not additionally the branch destination address.
Other exceptions may require to handle BD. In simple cases, when BD=0, the exception handler may return to EPC+0 (retry execution of the opcode), or to EPC+4 (skip the opcode that caused the exception). Note that jumps to faulty memory locations are executed without exception, but will trigger address errors and bus errors at the target location, ie. EPC (and BadVAddr, in case of address errors) point to the faulty address, not to the opcode that has jumped to that address).
Interrupts vs GTE Commands
If an interrupt occurs "on" a GTE command (cop2cmd), then the GTE command is
executed, but nethertheless, the return address in EPC points to the GTE
command. So, if the exeception handler would return to EPC as usually, then the
GTE command would be executed twice. In best case, this would be a waste of
clock cycles, in worst case it may lead to faulty result (if the results from
the 1st execution are re-used as incoming parameters in the 2nd execution). To
fix the problem, the exception handler must do:
if (cause AND 7Ch)=00h ;if excode=interrupt
if ([epc] AND FE000000h)=4A000000h ;and opcode=cop2cmd
epc=epc+4 ;then skip that opcode
Of course, the above exeption handling won't work in branch delays (where BD gets set to indicate that EPC was modified) (best workaround is not to use GTE commands in branch delays).
Several games are known to rely on this, notably including the Crash Bandicoot trilogy, Jinx and Spyro the Dragon, all of which will render broken geometry if running on an emulator which doesn't emulate this, or if the installed interrupt service routine doesn't account for it.
cop0cmd=10h - RFE opcode - Prepare Return from Exception
The RFE opcode moves some bits in cop0r12 (SR): bit2-3 are copied to bit0-1,
and bit4-5 are copied to bit2-3, all other bits (including bit4-5) are left
unchanged.
The RFE opcode does NOT automatically jump to EPC. Instead, the exception
handler must copy EPC into a register (usually R26 aka K0), and then jump to
that address. Because of branch delays, that would look like so:
mov k0,epc ;get return address
push k0 ;save epc in memory (if you expect nested exceptions)
... ;whatever (ie. process CAUSE)
pop k0 ;restore from memory (if you expect nested exceptions)
jmp k0 ;jump to K0 (after executing the next opcode)
+rfe ;move SR bit4/5 --> bit2/3 --> bit0/1
cop0r8 - BadVaddr - Bad Virtual Address (R)
Contains the address whose reference caused an exception. Set on any MMU type
of exceptions, on references outside of kuseg (in User mode) and on any
misaligned reference. BadVaddr is updated ONLY by Address errors (Excode 04h
and 05h), all other exceptions (including bus errors) leave BadVaddr unchanged.
Exception Vectors (depending on BEV bit in SR register)
Exception BEV=0 BEV=1
Reset BFC00000h BFC00000h (Reset)
UTLB Miss 80000000h BFC00100h (Virtual memory, none such in PSX)
COP0 Break 80000040h BFC00140h (Debug Break)
General 80000080h BFC00180h (General Interrupts & Exceptions)
The PSX uses only the BEV=0 vectors (aside from the reset vector, the PSX BIOS ROM doesn't contain any of the BEV=1 vectors).
Exception Priority
Reset At any time (highest) ;-reset
AdEL Memory (Load instruction) ;\
AdES Memory (Store instruction) ; memory (data load/store)
DBE Memory (Load or store) ;/
MOD ALU (Data TLB) ;\
TLBL ALU (DTLB Miss) ; none such
TLBS ALU (DTLB Miss) ;/
Ovf ALU ;-overflow
Int ALU ;-interrupt
Sys RD (Instruction Decode) ;\
Bp RD (Instruction Decode) ;
RI RD (Instruction Decode) ;
CpU RD (Instruction Decode) ;/
TLBL I-Fetch (ITLB Miss) ;-none such
AdEL IVA (Instruction Virtual Address) ;\memory (opcode fetch)
IBE RD (end of I-Fetch, lowest) ;/
COP0 - Misc
cop0r15 - PRID - Processor ID (R)
0-7 Revision
8-15 Implementation
16-31 Not used
Unknown if/which other Playstation CPU versions have other values...?
cop0r6 - JUMPDEST - Randomly memorized jump address (R)
The is a rather strange totally useless register. After certain exceptions, the
CPU does memorize a jump destination address in the register. Once when it has
memorized an address, the register becomes locked, and the memorized value
won't change until it becomes unlocked by a new exception. Exceptions that do
unlock the register are Reset and Interrupts (cause.bit10). Exceptions that do
NOT unlock the register are syscall/break opcodes, and software generated
interrupts (eg. cause.bit8).
In the unlocked state, the CPU does more or less randomly memorize one of the
next some jump destinations - eg. the destination from the second jump after
reset, or from a jump that occured immediately before executing the IRQ
handler, or from a jump inside of the IRQ handler, or even from a later jump
that occurs shortly after returning from the IRQ handler.
The register seems to be read-only (although the Kernel initialization code
writes 0 to it for whatever reason).
cop0r0..r2, cop0r4, cop0r10, cop0r32..r63 - N/A
Registers 0,1,2,4,10 control virtual memory on some MIPS processors (but
there's none such in the PSX), and Registers 32..63 (aka "control registers")
aren't used in any MIPS processors. Trying to read any of these registers
causes a Reserved Instruction Exception (excode=0Ah).
cop0cmd=01h,02h,06h,08h - TLBR,TLBWI,TLBWR,TLBP
The PSX supports only one cop0cmd (cop0cmd=10h aka RFE). Trying to execute the
TLBxx opcodes causes a Reserved Instruction Exception (excode=0Ah).
jf/jt cop0flg,dest - conditional cop0 jumps
mov [mem],cop0reg / mov cop0reg,[mem] - coprocessor cop0 load/store
Not supported by the CPU. Trying to execute these opcodes causes a Coprocessor
Unusable Exception (excode=0Bh, ie. unlike above, not 0Ah).
cop0r16-r31 - Garbage
Trying to read these registers returns garbage (but does not trigger an
exception). When reading one of the garbage registers shortly after reading a
valid cop0 register, the garbage value is usually the same as that of the valid
register. When doing the read later on, the return value is usually 00000020h,
or when reading much later it returns 00000040h, or even 00000100h. No idea
what is causing that effect...?
Note: The garbage registers can be accessed (without causing an exception) even
in "User mode with cop0 disabled" (SR.Bit1=1 and SR.Bit28=0); accessing any
other existing cop0 registers (or executing the rfe opcode) in that state is
causing Coprocessor Unusable Exceptions (excode=0Bh).
COP0 - Debug Registers
The COP0 debug registers seem to be PSX specific, normal R30xx CPUs like IDT's
R3041 and R3051 don't have anything similar.
cop0r7 - DCIC - Breakpoint control (R/W)
0 Automatically set by hardware upon Any break (R/W)
1 Automatically set by hardware upon BPC Code break (R/W)
2 Automatically set by hardware upon BDA Data break (R/W)
3 Automatically set by hardware upon BDA Data-Read break (R/W)
4 Automatically set by hardware upon BDA Data-Write break (R/W)
5 Automatically set by hardware upon any-jump break (R/W)
6-11 Not used (always zero)
12-13 Jump Redirection (0=Disable, 1..3=Enable) (see note) (R/W)
14-15 Unknown? (R/W)
16-22 Not used (always zero)
23 Super-Master Enable 1 for bit24-29
24 Execution breakpoint (0=Disabled, 1=Enabled) (see BPC, BPCM)
25 Data access breakpoint (0=Disabled, 1=Enabled) (see BDA, BDAM)
26 Break on Data-Read (0=No, 1=Break/when Bit25=1)
27 Break on Data-Write (0=No, 1=Break/when Bit25=1)
28 Break on any-jump (0=No, 1=Break on branch/jump/call/etc.)
29 Master Enable for bit28 (..and/or exec-break at address>=80000000h?)
30 Master Enable for bit24-27
31 Super-Master Enable 2 for bit24-29
cop0r7.bit12-13 - Jump Redirection Note
If one or both of these bits are nonzero, then the PSX seems to check for the
following opcode sequence,
mov rx,[mem] ;load rx from memory
... ;one or more opcodes that do not change rx
jmp/call rx ;jump or call to rx
cop0r5 - BDA - Breakpoint on Data Access Address (R/W)
cop0r9 - BDAM - Breakpoint on Data Access Mask (R/W)
Break condition is "((addr XOR BDA) AND BDAM)=0".
cop0r3 - BPC - Breakpoint on Execute Address (R/W)
cop0r11 - BPCM - Breakpoint on Execute Mask (R/W)
Break condition is "((PC XOR BPC) AND BPCM)=0".
Note (BREAK Opcode)
Additionally, the BREAK opcode can be used to create further breakpoints by
patching the executable code. The BREAK opcode uses the same Excode value (09h)
in CAUSE register. However, the BREAK opcode jumps to the normal exception
handler at 80000080h (not 80000040h).
Note (LibCrypt)
The debug registers are mis-used by "Legacy of Kain: Soul Reaver" (and maybe
also other games) for storing libcrypt copy-protection related values (ie. just
as a "hidden" location for storing data, not for actual debugging purposes).
CDROM Protection - LibCrypt
Note (Cheat Devices/Expansion ROMs)
The Expansion ROM header supports only Pre-Boot and Post-Boot vectors, but no
Mid-Boot vector. Cheat Devices are often using COP0 breaks for Mid-Boot Hooks,
either with BPC=BFC06xxxh (break address in ROM, used in older cheat
firmwares), or with BPC=80030000h (break address in RAM aka relocated GUI
entrypoint, used in later cheat firmwares). Moreover, aside from the Mid-Boot
Hook, the Xplorer cheat device is also supporting a special cheat code that
uses the COP0 break feature.
Note (Datasheet)
Note: COP0 debug registers are described in LSI's "L64360" datasheet, chapter
14. And in their LR33300/LR33310 datasheet, chapter 4.