System Control Coprocessor (COP0)
All registers are 32bit wide.
Name Alias Common Usage (R0) zero Constant (always 0) (this one isn't a real register) R1 at Assembler temporary (destroyed by some pseudo opcodes!) R2-R3 v0-v1 Subroutine return values, may be changed by subroutines R4-R7 a0-a3 Subroutine arguments, may be changed by subroutines R8-R15 t0-t7 Temporaries, may be changed by subroutines R16-R23 s0-s7 Static variables, must be saved by subs R24-R25 t8-t9 Temporaries, may be changed by subroutines R26-R27 k0-k1 Reserved for kernel (destroyed by some IRQ handlers!) R28 gp Global pointer (rarely used) R29 sp Stack pointer R30 fp(s8) Frame Pointer, or 9th Static variable, must be saved R31 ra Return address (used so by JAL,BLTZAL,BGEZAL opcodes) - pc Program counter - hi,lo Multiply/divide results, may be changed by subroutines
R0 is always zero.
R31 can be used as general purpose register, however, some opcodes are using it to store the return address: JAL, BLTZAL, BGEZAL. (Note: JALR can optionally store the return address in R31, or in R1..R30. Exceptions store the return address in cop0r14 - EPC).
R29 (SP) - Full Decrementing Wasted Stack Pointer
The CPU doesn't explicitly have stack-related registers or opcodes, however,
conventionally, R29 is used as stack pointer (SP). The stack can be accessed
with normal load/store opcodes, which do not automatically increase/decrease
SP, so the SP register must be manually modified to (de-)allocate data.
The PSX BIOS is using "Full Decrementing Wasted Stack".
Decrementing means that SP gets decremented when allocating data (that's common for most CPUs) - Full means that SP points to the first ALLOCATED word on the stack, so the allocated memory is at SP+0 and above, free memory at SP-1 and below, Wasted means that when calling a sub-function with N parameters, then the caller must pre-allocate N works on stack, and the sub-function may freely use and destroy these words; at [SP+0..N*4-1].
For example, "push ra,r16,r17" would be implemented as:
sub sp,20h mov [sp+14h],ra mov [sp+18h],r16 mov [sp+1Ch],r17
where the allocated 20h bytes have the following purpose:
[sp+00h..0Fh] wasted stack (may, or may not, be used by sub-functions) [sp+10h..13h] 8-byte alignment padding (not used) [sp+14h..1Fh] pushed registers
CPU Opcode Encoding
Primary opcode field (Bit 26..31)
00h=SPECIAL 08h=ADDI 10h=COP0 18h=N/A 20h=LB 28h=SB 30h=LWC0 38h=SWC0 01h=BcondZ 09h=ADDIU 11h=COP1 19h=N/A 21h=LH 29h=SH 31h=LWC1 39h=SWC1 02h=J 0Ah=SLTI 12h=COP2 1Ah=N/A 22h=LWL 2Ah=SWL 32h=LWC2 3Ah=SWC2 03h=JAL 0Bh=SLTIU 13h=COP3 1Bh=N/A 23h=LW 2Bh=SW 33h=LWC3 3Bh=SWC3 04h=BEQ 0Ch=ANDI 14h=N/A 1Ch=N/A 24h=LBU 2Ch=N/A 34h=N/A 3Ch=N/A 05h=BNE 0Dh=ORI 15h=N/A 1Dh=N/A 25h=LHU 2Dh=N/A 35h=N/A 3Dh=N/A 06h=BLEZ 0Eh=XORI 16h=N/A 1Eh=N/A 26h=LWR 2Eh=SWR 36h=N/A 3Eh=N/A 07h=BGTZ 0Fh=LUI 17h=N/A 1Fh=N/A 27h=N/A 2Fh=N/A 37h=N/A 3Fh=N/A
Secondary opcode field (Bit 0..5) (when Primary opcode = 00h)
00h=SLL 08h=JR 10h=MFHI 18h=MULT 20h=ADD 28h=N/A 30h=N/A 38h=N/A 01h=N/A 09h=JALR 11h=MTHI 19h=MULTU 21h=ADDU 29h=N/A 31h=N/A 39h=N/A 02h=SRL 0Ah=N/A 12h=MFLO 1Ah=DIV 22h=SUB 2Ah=SLT 32h=N/A 3Ah=N/A 03h=SRA 0Bh=N/A 13h=MTLO 1Bh=DIVU 23h=SUBU 2Bh=SLTU 33h=N/A 3Bh=N/A 04h=SLLV 0Ch=SYSCALL 14h=N/A 1Ch=N/A 24h=AND 2Ch=N/A 34h=N/A 3Ch=N/A 05h=N/A 0Dh=BREAK 15h=N/A 1Dh=N/A 25h=OR 2Dh=N/A 35h=N/A 3Dh=N/A 06h=SRLV 0Eh=N/A 16h=N/A 1Eh=N/A 26h=XOR 2Eh=N/A 36h=N/A 3Eh=N/A 07h=SRAV 0Fh=N/A 17h=N/A 1Fh=N/A 27h=NOR 2Fh=N/A 37h=N/A 3Fh=N/A
31..26 |25..21|20..16|15..11|10..6 | 5..0 | 6bit | 5bit | 5bit | 5bit | 5bit | 6bit | -------+------+------+------+------+--------+------------ 000000 | N/A | rt | rd | imm5 | 0000xx | shift-imm 000000 | rs | rt | rd | N/A | 0001xx | shift-reg 000000 | rs | N/A | N/A | N/A | 001000 | jr 000000 | rs | N/A | rd | N/A | 001001 | jalr 000000 | <-----comment20bit------> | 00110x | sys/brk 000000 | N/A | N/A | rd | N/A | 0100x0 | mfhi/mflo 000000 | rs | N/A | N/A | N/A | 0100x1 | mthi/mtlo 000000 | rs | rt | N/A | N/A | 0110xx | mul/div 000000 | rs | rt | rd | N/A | 10xxxx | alu-reg 000001 | rs | 00000| <--immediate16bit--> | bltz 000001 | rs | 00001| <--immediate16bit--> | bgez 000001 | rs | 10000| <--immediate16bit--> | bltzal 000001 | rs | 10001| <--immediate16bit--> | bgezal 00001x | <---------immediate26bit---------> | j/jal 00010x | rs | rt | <--immediate16bit--> | beq/bne 00011x | rs | N/A | <--immediate16bit--> | blez/bgtz 001xxx | rs | rt | <--immediate16bit--> | alu-imm 001111 | N/A | rt | <--immediate16bit--> | lui-imm 100xxx | rs | rt | <--immediate16bit--> | load rt,[rs+imm] 101xxx | rs | rt | <--immediate16bit--> | store rt,[rs+imm] x1xxxx | <------coprocessor specific------> | coprocessor (see below)
Coprocessor Opcode/Parameter Encoding
31..26 |25..21|20..16|15..11|10..6 | 5..0 | 6bit | 5bit | 5bit | 5bit | 5bit | 6bit | -------+------+------+------+------+--------+------------ 0100nn |0|0000| rt | rd | N/A | 000000 | MFCn rt,rd_dat ;rt = dat 0100nn |0|0010| rt | rd | N/A | 000000 | CFCn rt,rd_cnt ;rt = cnt 0100nn |0|0100| rt | rd | N/A | 000000 | MTCn rt,rd_dat ;dat = rt 0100nn |0|0110| rt | rd | N/A | 000000 | CTCn rt,rd_cnt ;cnt = rt 0100nn |0|1000|00000 | <--immediate16bit--> | BCnF target ;jump if false 0100nn |0|1000|00001 | <--immediate16bit--> | BCnT target ;jump if true 0100nn |1| <--------immediate25bit--------> | COPn imm25 010000 |1|0000| N/A | N/A | N/A | 000001 | COP0 01h ;=TLBR 010000 |1|0000| N/A | N/A | N/A | 000010 | COP0 02h ;=TLBWI 010000 |1|0000| N/A | N/A | N/A | 000110 | COP0 06h ;=TLBWR 010000 |1|0000| N/A | N/A | N/A | 001000 | COP0 08h ;=TLBP 010000 |1|0000| N/A | N/A | N/A | 010000 | COP0 10h ;=RFE 1100nn | rs | rt | <--immediate16bit--> | LWCn rt_dat,[rs+imm] 1110nn | rs | rt | <--immediate16bit--> | SWCn rt_dat,[rs+imm]
All opcodes that are marked as "N/A" in the Primary and Secondary opcode tables
are causing a Reserved Instruction Exception (excode=0Ah).
The unused operand bits (eg. Bit21-25 for LUI opcode) should be usually zero, but do not necessarily trigger exceptions if set to nonzero values.
CPU Load/Store Opcodes
lb rt,imm(rs) rt=[imm+rs] ;byte sign-extended lbu rt,imm(rs) rt=[imm+rs] ;byte zero-extended lh rt,imm(rs) rt=[imm+rs] ;halfword sign-extended lhu rt,imm(rs) rt=[imm+rs] ;halfword zero-extended lw rt,imm(rs) rt=[imm+rs] ;word
Load instructions can read from the data cache (if the data is not in the
cache, or if the memory region is uncached, then the CPU gets halted until it
has read the data) (however, the PSX doesn't have a data cache).
Load and store instructions can generate address error exceptions if the memory address is not properly aligned (To a halfword boundary for lh/lhu/sh or a word boundary for lw/sw. lwl/lwr/swl/swr can't access misaligned address as they force align the memory address). Additionally, accessing certain invalid memory locations will cause a bus error exception. If an exception occurs during a load instruction, the rt register is left untouched.
Caution - Load Delay
The loaded data is NOT available to the next opcode, ie. the target register
isn't updated until the next opcode has completed. So, if the next opcode tries
to read from the load destination register, then it would (usually) receive the
OLD value of that register (unless an IRQ occurs between the load and next
opcode, in that case the load would complete during IRQ handling, and so, the
next opcode would receive the NEW value).
MFC2/CFC2 also have a 1-instruction delay until the target register is loaded with its new value (more info in the GTE section).
sb rt,imm(rs) [imm+rs]=(rt AND FFh) ;store 8bit sh rt,imm(rs) [imm+rs]=(rt AND FFFFh) ;store 16bit sw rt,imm(rs) [imm+rs]=rt ;store 32bit
Store operations are passed to the write-queue, so they can execute within a
single clock cycle (unless the write-queue was full, in that case the CPU gets
halted until there's room in the queue). For more information on the write-queue, visit this page.
Caution - 8/16-bit writes to certain IO registers
During an 8-bit or 16-bit store, all 32 bits of the GPR are placed on the bus.
As such, when writing to certain 32-bit IO registers with an 8 or 16-bit store, it will behave like a 32-bit store, using the register's full value.
The soundscope on some shells is known to rely on this, as it uses
sh to write to certain DMA registers.
If this is not properly emulated, the soundscope will hang, waiting for an interrupt that will never be fired.
Halfword addresses must be aligned by 2, word addresses must be aligned by 4,
trying to access mis-aligned addresses will cause an exception. There's no
alignment restriction for bytes.
lwr rt,imm(rs) load right bits of rt from memory (usually imm+0) lwl rt,imm(rs) load left bits of rt from memory (usually imm+3) swr rt,imm(rs) store right bits of rt to memory (usually imm+0) swl rt,imm(rs) store left bits of rt to memory (usually imm+3)
There's no delay required between lwl and lwr, so you can use them directly
following eachother, eg. to load a word anywhere in memory without regard to
lwl r2,$0003(t0) ;\no delay required between these lwr r2,$0000(t0) ;/(although both access r2) nop ;-requires load delay HERE (before reading from r2) and r2,r2,0ffffh ;-access r2 (eg. reducing it to unaligned 16bit data)
Unaligned Load/Store (Details)
LWR/SWR transfers the right (=lower) bits of Rt, up-to 32bit memory boundary:
lwr/swr [N*4+0] transfer whole 32bit of Rt to/from [N*4+0..3] lwr/swr [N*4+1] transfer lower 24bit of Rt to/from [N*4+1..3] lwr/swr [N*4+2] transfer lower 16bit of Rt to/from [N*4+2..3] lwr/swr [N*4+3] transfer lower 8bit of Rt to/from [N*4+3]
LWL/SWL transfers the left (=upper) bits of Rt, down-to 32bit memory boundary:
lwl/swl [N*4+0] transfer upper 8bit of Rt to/from [N*4+0] lwl/swl [N*4+1] transfer upper 16bit of Rt to/from [N*4+0..1] lwl/swl [N*4+2] transfer upper 24bit of Rt to/from [N*4+0..2] lwl/swl [N*4+3] transfer whole 32bit of Rt to/from [N*4+0..3]
The CPU has four separate byte-access signals, so, within a 32bit location, it
can transfer all fragments of Rt at once (including for odd 24bit amounts). The
transferred data is not zero- or sign-expanded, eg. when transferring 8bit
data, the other 24bit of Rt and [mem] will remain intact.
Note: The aligned variant can also misused for blocking memory access on
aligned addresses (in that case, if the address is known to be aligned, only
one of the opcodes are needed, either LWL or LWR).... Uhhhhhhhm, OR is that NOT
allowed... more PROBABLY that doesn't work?
CPU ALU Opcodes
add rd,rs,rt rd=rs+rt (with overflow trap) addu rd,rs,rt rd=rs+rt sub rd,rs,rt rd=rs-rt (with overflow trap) subu rd,rs,rt rd=rs-rt addi rt,rs,imm rt=rs+(-8000h..+7FFFh) (with ov.trap) addiu rt,rs,imm rt=rs+(-8000h..+7FFFh)
The opcodes "with overflow trap" do trigger an exception (and leave rd
unchanged) in case of overflows.
setlt slt rd,rs,rt if rs<rt then rd=1 else rd=0 (signed) setb sltu rd,rs,rt if rs<rt then rd=1 else rd=0 (unsigned) setlt slti rt,rs,imm if rs<(-8000h..+7FFFh) then rt=1 else rt=0 (signed) setb sltiu rt,rs,imm if rs<(FFFF8000h..7FFFh) then rt=1 else rt=0(unsigned)
and rd,rs,rt rd = rs AND rt or rd,rs,rt rd = rs OR rt xor rd,rs,rt rd = rs XOR rt nor rd,rs,rt rd = FFFFFFFFh XOR (rs OR rt) andi rt,rs,imm rt = rs AND (0000h..FFFFh) ori rt,rs,imm rt = rs OR (0000h..FFFFh) xori rt,rs,imm rt = rs XOR (0000h..FFFFh)
sllv rd,rt,rs rd = rt SHL (rs AND 1Fh) srlv rd,rt,rs rd = rt SHR (rs AND 1Fh) srav rd,rt,rs rd = rt SAR (rs AND 1Fh) sll rd,rt,imm rd = rt SHL (00h..1Fh) srl rd,rt,imm rd = rt SHR (00h..1Fh) sra rd,rt,imm rd = rt SAR (00h..1Fh) lui rt,imm rt = (0000h..FFFFh) SHL 16
Unlike many other opcodes, shifts use 'rt' as second (not third) operand.
The hardware does NOT generate exceptions on SHL overflows.
mult rs,rt hi:lo = rs*rt (signed) multu rs,rt hi:lo = rs*rt (unsigned) div rs,rt lo = rs/rt, hi=rs mod rt (signed) divu rs,rt lo = rs/rt, hi=rs mod rt (unsigned) mfhi rd rd=hi ;move from hi mflo rd rd=lo ;move from lo mthi rs hi=rs ;move to hi mtlo rs lo=rs ;move to lo
The mul/div opcodes are starting the multiply/divide operation, starting takes
only a single clock cycle, however, trying to read the result from the hi/lo
registers while the mul/div operation is busy will halt the CPU until the
mul/div has completed. For multiply, the execution time depends on rs (ie.
"small*large" can be much faster than "large*small").
__multu_execution_time_____________________________________________________ Fast (6 cycles) rs = 00000000h..000007FFh Med (9 cycles) rs = 00000800h..000FFFFFh Slow (13 cycles) rs = 00100000h..FFFFFFFFh __mult_execution_time_____________________________________________________ Fast (6 cycles) rs = 00000000h..000007FFh, or rs = FFFFF800h..FFFFFFFFh Med (9 cycles) rs = 00000800h..000FFFFFh, or rs = FFF00000h..FFFFF801h Slow (13 cycles) rs = 00100000h..7FFFFFFFh, or rs = 80000000h..FFF00001h __divu/div_execution_time________________________________________________ Fixed (36 cycles) no matter of rs and rt values
For example, when executing "multu 123h,12345678h" and "mflo r1", one can
insert up to six (cached) ALU opcodes, or read one value from PSX Main RAM
(which has 6 cycle access time) between the "multu" and "mflo" opcodes without
The hardware does NOT generate exceptions on divide overflows, instead, divide errors are returning the following values:
Opcode Rs Rd Hi/Remainder Lo/Result divu 0..FFFFFFFFh 0 --> Rs FFFFFFFFh div 0..+7FFFFFFFh 0 --> Rs -1 div -80000000h..-1 0 --> Rs +1 div -80000000h -1 --> 0 -80000000h
For divu, the result is more or less correct (as close to infinite as
possible). For div, the results are total garbage (about furthest away from
the desired result as possible).
Note: After accessing the lo/hi registers, there seems to be a strange rule that one should not touch the lo/hi registers in the next 2 cycles or so... not yet understood if/when/how that rule applies...?
CPU Jump Opcodes
jumps and branches
Note that the instruction following the branch will always be executed.
j dest pc=(pc and F0000000h)+(imm26bit*4) jal dest pc=(pc and F0000000h)+(imm26bit*4),ra=$+8 jr rs pc=rs jalr (rd,)rs(,rd) pc=rs, rd=$+8 ;see caution beq rs,rt,dest if rs=rt then pc=$+4+(-8000h..+7FFFh)*4 bne rs,rt,dest if rs<>rt then pc=$+4+(-8000h..+7FFFh)*4 bltz rs,dest if rs<0 then pc=$+4+(-8000h..+7FFFh)*4 bgez rs,dest if rs>=0 then pc=$+4+(-8000h..+7FFFh)*4 bgtz rs,dest if rs>0 then pc=$+4+(-8000h..+7FFFh)*4 blez rs,dest if rs<=0 then pc=$+4+(-8000h..+7FFFh)*4 bltzal rs,dest if rs<0 then pc=$+4+(..)*4; ra=$+8; bgezal rs,dest if rs>=0 then pc=$+4+(..)*4; ra=$+8;
jr/jalr can be used to jump to an unaligned address, in which case an address error (AdEL) exception will be raised on the next instruction fetch.
Additionally, bltzal/bgezal will always place the return address in $ra, whether or not the branch is taken. Additionally, if
rs is $ra, then the value used for the comparison is $ra's value before linking.
Caution: The JALR source code syntax varies (IDT79R3041 specs say "jalr rs,rd",
but MIPS32 specs say "jalr rd,rs"). Moreover, JALR may not use the same
register for both operands (eg. "jalr r31,r31") (doing so would destroy the
target address; which is normally no problem, but it can be a problem if an IRQ
occurs between the JALR opcode and the following branch delay opcode; in that
case BD gets set, and EPC points "back" to the JALR opcode, so JALR is executed
twice, with destroyed target address in second execution).
Unlike for jump/branch opcodes, exception opcodes are immediately executed (ie.
without executing the following opcode).
syscall imm20 generates a system call exception break imm20 generates a breakpoint exception
The 20bit immediate doesn't affect the CPU (however, the exception handler may
interprete it by software; by examing the opcode bits at [epc-4]).
CPU Coprocessor Opcodes
Coprocessor Instructions (COP0..COP3)
mfc# rt,rd ;rt = cop#datRd ;data regs cfc# rt,rd ;rt = cop#cntRd ;control regs mtc# rt,rd ;cop#datRd = rt ;data regs ctc# rt,rd ;cop#cntRd = rt ;control regs cop# imm25 ;exec cop# command 0..1FFFFFFh lwc# rt,imm(rs) ;cop#dat_rt = [rs+imm] ;word swc# rt,imm(rs) ;[rs+imm] = cop#dat_rt ;word bc#f dest ;if cop#flg=false then pc=$+disp bc#t dest ;if cop#flg=true then pc=$+disp rfe ;return from exception (COP0) tlb<xx> ;virtual memory related (COP0)
Unknown if any tlb-opcodes (tlbr,tlbwi,tlbwr,tlbp) are implemented in the psx?
Caution - Load Delay
When reading from a coprocessor register, the next opcode cannot use the
destination register as operand (much the same as the Load Delays that occur
when reading from memory; see there for details).
Reportedly, the Load Delay applies for the next TWO opcodes after coprocessor reads, but, that seems to be nonsense (the PSX does finish both COP0 and COP2 reads after ONE opcode).
Caution - Store Delay
In some cases, a similar delay occurs when writing to a coprocessor register.
COP0 is more or less free of store delays (eg. one can read from a cop0
register immediately after writing to it), the only known exception is the cop2
enable bit in cop0r12.bit30 (setting that cop0 bit acts delayed, and cop2 isn't
actually enabled until after 2 clock cycles or so).
Writing to cop2 registers has a delay of 2..3 clock cycles. In most cases, that is probably (?) only 2 cycles, but special cases like writing to IRGB (which does additionally affect IR1,IR2,IR3) take 3 cycles until the result arrives in all registers).
Note that Store Delays are counted in numbers of clock cycles (not in numbers of opcodes). For 3 cycle delay, one must usually insert 3 cached opcodes (or one uncached opcode).
CPU Pseudo Opcodes
Pseudo instructions (native/spasm)
nop ;alias for sll r0,r0,0 move rd,rs ;alias for addu rd,rs,r0 la rx,imm32 ;load address (alias for lui rx / addiu rx) li rx,imm32 ;load immediate (alias for lui rx / ori rx) li rx,imm16 ;load immediate (alias for ori, range 0..FFFFh) li rx,-imm15 ;load immediate (alias for addiu, range -1..-8000h) li rx,imm16*10000h ;load immediate (alias for lui) lw rx,imm32 ;load from address (lui rx / lw rx,rx) sw rx,imm32 ;store to address (lui r1 / sw rx,r1) (destroys r1!) lb,lh,lwl,lwr,lbu,lhu;as above pseudo lw sb,sh,swl,swr ;as above pseudo sw (ie. also destroys r1!) alu rx,op ;alias for alu rx,rx,op alu(u) rx,rx,imm ;alias for alui(u) rx,rx,imm jalr rx ;alias for jalr (RA,)rx(,RA) subi(u) rt,rs,imm ;alias for addi(u) rt,rs,-imm beqz rx,dest ;alias for beq rx,r0,dest bnez rx,dest ;alias for bne rx,r0,dest b dest ;alias for beq r0,r0,dest (jump relative/spasm) bra dest ;alias for bgez r0, r0, dest bal dest ;alias for bgezal r0, r0, dest
Pseudo instructions (nocash/a22i)
mov rx,NNNN0000h ;alias for lui rx,NNNNh mov rx,0000NNNNh ;alias for or rx,r0,NNNNh ;max +FFFFh mov rx,-imm15 ;alias for add rx,r0,-NNNNh ;min -8000h mov rx,ry ;alias for or rx,ry,0 (or "addiu") nop ;alias for shl r0,r0,0 jrel dest ;alias for blez R0,dest ;relative jump crel dest ;alias for callns R0,dest ;relative call jz rx,dest ;alias for je rx,R0,dest jnz rx,dest ;alias for jne rx,R0,dest call rx ;alias for call rx,ret=RA ret ;alias for jmp ra subt rt,rs,imm ;alias for addt rt,rs,-imm sub rt,rs,imm ;alias for add rt,rs,-imm alu rx,op ;alias for alu rx,rx,op neg(t) rx,ry ;alias for sub(t) rx,R0,ry not rx,ry ;alias for nor rx,R0,ry neg(t)/not rx ;alias for neg(t)/not rx,rx setz rx,ry ;alias for setb rx,ry,1 (set if zero) setnz rx,ry ;alias for setb rx,R0,ry (set if nonzero) syscall/break ;alias for syscall/break 000000h
Below are pseudo instructions combined of two 32bit opcodes...
movp rx,imm32 ;alias for lui rx,imm16 -plus- or rx,rx,imm16) mov(bhs)p rx,[imm32] ;load from address (lui rx,imm16 / mov rx,[rx+imm16]) movu [rs+imm] ;alias for lwr/swr [rs+imm] plus lwl/swl [rs+imm+3] reti ;alias for jmp k0 plus rfe
Below are pseudo instructions combined of two or more 32bit opcodes...
push rlist ;alias for sub sp,n*4 -- mov [sp+(1..n)*4],r1..rn pop rlist ;alias for mov r1..rn,[sp+(1..n)*4] -- add sp,n*4 pop pc,rlist ;alias for pop ra,rlist -- jmp ra
Possible more Pseudos...
call x0000000h ;call y0000000h (could be half-working for mem mirrors?) setae,setge ;--> setb,setlt with swapped operands
.mips ;select MIPS instruction set (alternately .hc05 for MC68HC05) .bios ;create a .ROM file (instead of .EXE) .auto_nop ;append NOPs to jumps ;unless next opcode starts with a + org imm ;assume following code to be originated at address "imm" db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings dw n(,n(..))) ;define 16bit data values(s) (not 32bit data!) dd n(,n(..))) ;define 32bit data values(s) .align imm 0 ;alias for immediate 0 and register R0 (whichever fits)
org imm ;self-explaining (but, default=$80010000 for spasm!) align imm ;self-explaining (probably zeropadded?) db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings dh n(,n(..))) ;define 16bit data values(s) dw n(,n(..))) ;define 32bit data values(s) (not 16bit data!) dcb len,value ;fill <len> bytes by <value> (different as DCB on ARM CPUs) xyz ;define label "xyz" at current address (without colon) xyz equ n ;assign value n to xyz xyz = n ;probably same/sililar as "equ" ;xyz ;comments invoked with semicolon (spasm) incbin file.bin ;import binary file include file.asm ;import asm file zero ;alias for r0 >imm32 ;alias for (i-(i AND 8000h))/10000h, and/or i/10000h ? <imm32 ;alias for (i AND 0FFFFh), used for SW(+/-) and ORI(+)? end ;N/A ;no "end" or ".end" directive needed/used by spasm r1 aka at ;N/A ;some assemblers may (optionally) reject to use r1/at
Syntax for unknown assembler (for pad.s)
It uses "0x" for HEX values (but doesn't use "$" for registers).
It uses "#" instead of ";" for comments.
It uses ":" for labels (fortunately).
The assembler has at least one directive: ".byte" (equivalent to "db" on other assemblers).
I've no clue which assembler is used for that syntax... could that be the Psy-Q assembler?
COP0 - Register Summary
COP0 Register Summary
cop0r0-r2 - N/A cop0r3 - BPC - Breakpoint on execute (R/W) cop0r4 - N/A cop0r5 - BDA - Breakpoint on data access (R/W) cop0r6 - JUMPDEST - Randomly memorized jump address (R) cop0r7 - DCIC - Breakpoint control (R/W) cop0r8 - BadVaddr - Bad Virtual Address (R) cop0r9 - BDAM - Data Access breakpoint mask (R/W) cop0r10 - N/A cop0r11 - BPCM - Execute breakpoint mask (R/W) cop0r12 - SR - System status register (R/W) cop0r13 - CAUSE - (R) Describes the most recently recognised exception cop0r14 - EPC - Return Address from Trap (R) cop0r15 - PRID - Processor ID (R) cop0r16-r31 - Garbage cop0r32-r63 - N/A - None such (Control regs)
COP0 - Exception Handling
cop0r13 - CAUSE - (Read-only, except, Bit8-9 are R/W)
Describes the most recently recognised exception
0-1 - Not used (zero) 2-6 Excode Describes what kind of exception occured: 00h INT Interrupt 01h MOD Tlb modification (none such in PSX) 02h TLBL Tlb load (none such in PSX) 03h TLBS Tlb store (none such in PSX) 04h AdEL Address error, Data load or Instruction fetch 05h AdES Address error, Data store The address errors occur when attempting to read outside of KUseg in user mode and when the address is misaligned. (See also: BadVaddr register) 06h IBE Bus error on Instruction fetch 07h DBE Bus error on Data load/store 08h Syscall Generated unconditionally by syscall instruction 09h BP Breakpoint - break instruction 0Ah RI Reserved instruction 0Bh CpU Coprocessor unusable 0Ch Ov Arithmetic overflow 0Dh-1Fh Not used 7 - Not used (zero) 8-15 Ip Interrupt pending field. Bit 8 and 9 are R/W, and contain the last value written to them. As long as any of the bits are set they will cause an interrupt if the corresponding bit is set in IM. 16-27 - Not used (zero) 28-29 CE Contains the coprocessor number if the exception occurred because of a coprocessor instuction for a coprocessor which wasn't enabled in SR. 30 - Not used (zero) 31 BD Is set when last exception points to the branch instuction instead of the instruction in the branch delay slot, where the exception occurred.
cop0r12 - SR - System status register (R/W)
0 IEc Current Interrupt Enable (0=Disable, 1=Enable) ;rfe pops IUp here 1 KUc Current Kernel/User Mode (0=Kernel, 1=User) ;rfe pops KUp here 2 IEp Previous Interrupt Disable ;rfe pops IUo here 3 KUp Previous Kernel/User Mode ;rfe pops KUo here 4 IEo Old Interrupt Disable ;left unchanged by rfe 5 KUo Old Kernel/User Mode ;left unchanged by rfe 6-7 - Not used (zero) 8-15 Im 8 bit interrupt mask fields. When set the corresponding interrupts are allowed to cause an exception. 16 Isc Isolate Cache (0=No, 1=Isolate) When isolated, all load and store operations are targetted to the Data cache, and never the main memory. (Used by PSX Kernel, in combination with Port FFFE0130h) 17 Swc Swapped cache mode (0=Normal, 1=Swapped) Instruction cache will act as Data cache and vice versa. Use only with Isc to access & invalidate Instr. cache entries. (Not used by PSX Kernel) 18 PZ When set cache parity bits are written as 0. 19 CM Shows the result of the last load operation with the D-cache isolated. It gets set if the cache really contained data for the addressed memory location. 20 PE Cache parity error (Does not cause exception) 21 TS TLB shutdown. Gets set if a programm address simultaneously matches 2 TLB entries. (initial value on reset allows to detect extended CPU version?) 22 BEV Boot exception vectors in RAM/ROM (0=RAM/KSEG0, 1=ROM/KSEG1) 23-24 - Not used (zero) 25 RE Reverse endianness (0=Normal endianness, 1=Reverse endianness) Reverses the byte order in which data is stored in memory. (lo-hi -> hi-lo) (Affects only user mode, not kernel mode) (?) (The bit doesn't exist in PSX ?) 26-27 - Not used (zero) 28 CU0 COP0 Enable (0=Enable only in Kernel Mode, 1=Kernel and User Mode) 29 CU1 COP1 Enable (0=Disable, 1=Enable) (none in PSX) 30 CU2 COP2 Enable (0=Disable, 1=Enable) (GTE in PSX) 31 CU3 COP3 Enable (0=Disable, 1=Enable) (none in PSX)
cop0r14 - EPC - Return Address from Trap (R)
0-31 Return Address from Exception
This address is the instruction at which the exception took place, unless BD is
set in CAUSE, then the instruction was at EPC+4.
Interrupts should always return to EPC+0, no matter of the BD flag. That way, if BD=1, the branch gets executed again, that's required because EPC stores only the current program counter, but not additionally the branch destination address.
Other exceptions may require to handle BD. In simple cases, when BD=0, the exception handler may return to EPC+0 (retry execution of the opcode), or to EPC+4 (skip the opcode that caused the exception). Note that jumps to faulty memory locations are executed without exception, but will trigger address errors and bus errors at the target location, ie. EPC (and BadVAddr, in case of address errors) point to the faulty address, not to the opcode that has jumped to that address).
Interrupts vs GTE Commands
If an interrupt occurs "on" a GTE command (cop2cmd), then the GTE command is
executed, but nethertheless, the return address in EPC points to the GTE
command. So, if the exeception handler would return to EPC as usually, then the
GTE command would be executed twice. In best case, this would be a waste of
clock cycles, in worst case it may lead to faulty result (if the results from
the 1st execution are re-used as incoming parameters in the 2nd execution). To
fix the problem, the exception handler must do:
if (cause AND 7Ch)=00h ;if excode=interrupt if ([epc] AND FE000000h)=4A000000h ;and opcode=cop2cmd epc=epc+4 ;then skip that opcode
Note: The above exception handling is working only in newer PSX BIOSes, but in
very old PSX BIOSes, it is only incompletely implemented (see "BIOS Patches"
chapter for common workarounds; or write your own exception handler without
using the BIOS).
Of course, the above exeption handling won't work in branch delays (where BD gets set to indicate that EPC was modified) (best workaround is not to use GTE commands in branch delays).
Several games are known to rely on this, notably including the Crash Bandicoot trilogy, Jinx and Spyro the Dragon, all of which will render broken geometry if running on an emulator which doesn't emulate this, or if the installed interrupt service routine doesn't account for it.
cop0cmd=10h - RFE opcode - Prepare Return from Exception
The RFE opcode moves some bits in cop0r12 (SR): bit2-3 are copied to bit0-1,
and bit4-5 are copied to bit2-3, all other bits (including bit4-5) are left
The RFE opcode does NOT automatically jump to EPC. Instead, the exception handler must copy EPC into a register (usually R26 aka K0), and then jump to that address. Because of branch delays, that would look like so:
mov k0,epc ;get return address push k0 ;save epc in memory (if you expect nested exceptions) ... ;whatever (ie. process CAUSE) pop k0 ;restore from memory (if you expect nested exceptions) jmp k0 ;jump to K0 (after executing the next opcode) +rfe ;move SR bit4/5 --> bit2/3 --> bit0/1
If you expect exceptions to be nested deeply, also push/pop SR. Note that
there's no way to leave all registers intact (ie. above code destroys K0).
cop0r8 - BadVaddr - Bad Virtual Address (R)
Contains the address whose reference caused an exception. Set on any MMU type
of exceptions, on references outside of kuseg (in User mode) and on any
misaligned reference. BadVaddr is updated ONLY by Address errors (Excode 04h
and 05h), all other exceptions (including bus errors) leave BadVaddr unchanged.
Exception Vectors (depending on BEV bit in SR register)
Exception BEV=0 BEV=1 Reset BFC00000h BFC00000h (Reset) UTLB Miss 80000000h BFC00100h (Virtual memory, none such in PSX) COP0 Break 80000040h BFC00140h (Debug Break) General 80000080h BFC00180h (General Interrupts & Exceptions)
Note: Changing vectors at 800000xxh (kseg0) seems to be automatically reflected
to the instruction cache without needing to flush cache (at least it worked
SOMETIMES in my test proggy... but NOT always? ...anyways, it'd be highly
recommended to flush cache when changing any opcodes), whilst changing mirrors
at 000000xxh (kuseg) seems to require to flush cache.
The PSX uses only the BEV=0 vectors (aside from the reset vector, the PSX BIOS ROM doesn't contain any of the BEV=1 vectors).
Reset At any time (highest) ;-reset AdEL Memory (Load instruction) ;\ AdES Memory (Store instruction) ; memory (data load/store) DBE Memory (Load or store) ;/ MOD ALU (Data TLB) ;\ TLBL ALU (DTLB Miss) ; none such TLBS ALU (DTLB Miss) ;/ Ovf ALU ;-overflow Int ALU ;-interrupt Sys RD (Instruction Decode) ;\ Bp RD (Instruction Decode) ; RI RD (Instruction Decode) ; CpU RD (Instruction Decode) ;/ TLBL I-Fetch (ITLB Miss) ;-none such AdEL IVA (Instruction Virtual Address) ;\memory (opcode fetch) IBE RD (end of I-Fetch, lowest) ;/
COP0 - Misc
cop0r15 - PRID - Processor ID (R)
0-7 Revision 8-15 Implementation 16-31 Not used
For a Playstation with CXD8606CQ CPU, the PRID value is 00000002h.
Unknown if/which other Playstation CPU versions have other values...?
cop0r6 - JUMPDEST - Randomly memorized jump address (R)
The is a rather strange totally useless register. After certain exceptions, the
CPU does memorize a jump destination address in the register. Once when it has
memorized an address, the register becomes locked, and the memorized value
won't change until it becomes unlocked by a new exception. Exceptions that do
unlock the register are Reset and Interrupts (cause.bit10). Exceptions that do
NOT unlock the register are syscall/break opcodes, and software generated
interrupts (eg. cause.bit8).
In the unlocked state, the CPU does more or less randomly memorize one of the next some jump destinations - eg. the destination from the second jump after reset, or from a jump that occured immediately before executing the IRQ handler, or from a jump inside of the IRQ handler, or even from a later jump that occurs shortly after returning from the IRQ handler.
The register seems to be read-only (although the Kernel initialization code writes 0 to it for whatever reason).
cop0r0..r2, cop0r4, cop0r10, cop0r32..r63 - N/A
Registers 0,1,2,4,10 control virtual memory on some MIPS processors (but
there's none such in the PSX), and Registers 32..63 (aka "control registers")
aren't used in any MIPS processors. Trying to read any of these registers
causes a Reserved Instruction Exception (excode=0Ah).
cop0cmd=01h,02h,06h,08h - TLBR,TLBWI,TLBWR,TLBP
The PSX supports only one cop0cmd (cop0cmd=10h aka RFE). Trying to execute the
TLBxx opcodes causes a Reserved Instruction Exception (excode=0Ah).
jf/jt cop0flg,dest - conditional cop0 jumps
mov [mem],cop0reg / mov cop0reg,[mem] - coprocessor cop0 load/store
Not supported by the CPU. Trying to execute these opcodes causes a Coprocessor
Unusable Exception (excode=0Bh, ie. unlike above, not 0Ah).
cop0r16-r31 - Garbage
Trying to read these registers returns garbage (but does not trigger an
exception). When reading one of the garbage registers shortly after reading a
valid cop0 register, the garbage value is usually the same as that of the valid
register. When doing the read later on, the return value is usually 00000020h,
or when reading much later it returns 00000040h, or even 00000100h. No idea
what is causing that effect...?
Note: The garbage registers can be accessed (without causing an exception) even in "User mode with cop0 disabled" (SR.Bit1=1 and SR.Bit28=0); accessing any other existing cop0 registers (or executing the rfe opcode) in that state is causing Coprocessor Unusable Exceptions (excode=0Bh).
COP0 - Debug Registers
The COP0 debug registers seem to be PSX specific, normal R30xx CPUs like IDT's
R3041 and R3051 don't have anything similar.
cop0r7 - DCIC - Breakpoint control (R/W)
0 Automatically set by hardware upon Any break (R/W) 1 Automatically set by hardware upon BPC Code break (R/W) 2 Automatically set by hardware upon BDA Data break (R/W) 3 Automatically set by hardware upon BDA Data-Read break (R/W) 4 Automatically set by hardware upon BDA Data-Write break (R/W) 5 Automatically set by hardware upon any-jump break (R/W) 6-11 Not used (always zero) 12-13 Jump Redirection (0=Disable, 1..3=Enable) (see note) (R/W) 14-15 Unknown? (R/W) 16-22 Not used (always zero) 23 Super-Master Enable 1 for bit24-29 24 Execution breakpoint (0=Disabled, 1=Enabled) (see BPC, BPCM) 25 Data access breakpoint (0=Disabled, 1=Enabled) (see BDA, BDAM) 26 Break on Data-Read (0=No, 1=Break/when Bit25=1) 27 Break on Data-Write (0=No, 1=Break/when Bit25=1) 28 Break on any-jump (0=No, 1=Break on branch/jump/call/etc.) 29 Master Enable for bit28 (..and/or exec-break at address>=80000000h?) 30 Master Enable for bit24-27 31 Super-Master Enable 2 for bit24-29
When a breakpoint address match occurs the PSX jumps to 80000040h (ie. unlike
normal exceptions, not to 80000080h). The Excode value in the CAUSE register is
set to 09h (same as BREAK opcode), and EPC contains the return address, as
usually. One of the first things to be done in the exception handler is to
disable breakpoints (eg. if the any-jump break is enabled, then it must be
disabled BEFORE jumping from 80000040h to the actual exception handler).
cop0r7.bit12-13 - Jump Redirection Note
If one or both of these bits are nonzero, then the PSX seems to check for the
following opcode sequence,
mov rx,[mem] ;load rx from memory ... ;one or more opcodes that do not change rx jmp/call rx ;jump or call to rx
if it does sense that sequence, then it sets PC=[00000000h], but does not store
any useful information in any cop0 registers, namely it does not store the
return address in EPC, so it's impossible to determine which opcode has caused
the exception. For the jump target address, there are 31 registers, so one
could only guess which of them contains the target value; for "POP PC" code
it'd be usually R31, but for "JMP [vector]" code it may be any register. So far
the feature seems to be more or less unusable...?
cop0r5 - BDA - Breakpoint on Data Access Address (R/W)
cop0r9 - BDAM - Breakpoint on Data Access Mask (R/W)
Break condition is "((addr XOR BDA) AND BDAM)=0".
cop0r3 - BPC - Breakpoint on Execute Address (R/W)
cop0r11 - BPCM - Breakpoint on Execute Mask (R/W)
Break condition is "((PC XOR BPC) AND BPCM)=0".
Note (BREAK Opcode)
Additionally, the BREAK opcode can be used to create further breakpoints by
patching the executable code. The BREAK opcode uses the same Excode value (09h)
in CAUSE register. However, the BREAK opcode jumps to the normal exception
handler at 80000080h (not 80000040h).
The debug registers are mis-used by "Legacy of Kain: Soul Reaver" (and maybe
also other games) for storing libcrypt copy-protection related values (ie. just
as a "hidden" location for storing data, not for actual debugging purposes).
CDROM Protection - LibCrypt
Note (Cheat Devices/Expansion ROMs)
The Expansion ROM header supports only Pre-Boot and Post-Boot vectors, but no
Mid-Boot vector. Cheat Devices are often using COP0 breaks for Mid-Boot Hooks,
either with BPC=BFC06xxxh (break address in ROM, used in older cheat
firmwares), or with BPC=80030000h (break address in RAM aka relocated GUI
entrypoint, used in later cheat firmwares). Moreover, aside from the Mid-Boot
Hook, the Xplorer cheat device is also supporting a special cheat code that
uses the COP0 break feature.
Note: COP0 debug registers are described in LSI's "L64360" datasheet, chapter
14. And in their LR33300/LR33310 datasheet, chapter 4.