Geometry Transformation Engine (GTE)
GTE Overview
GTE Registers
GTE Saturation
GTE Opcode Summary
GTE Coordinate Calculation Commands
GTE General Purpose Calculation Commands
GTE Color Calculation Commands
GTE Division Inaccuracy
GTE Overview
GTE Operation
The GTE doesn't have any memory or I/O ports mapped to the CPU memory bus,
instead, it's solely accessed via coprocessor opcodes:
mov cop0r12,rt ;-enable/disable COP2 (GTE) via COP0 status register
mov cop2r0-63,rt ;\write parameters to GTE registers
mov cop2r0-31,[rs+imm] ;/
mov cop2cmd,imm25 ;-issue GTE command
mov rt,cop2r0-63 ;\read results from GTE registers
mov [rs+imm],cop2r0-31 ;/
jt cop2flg,dest ;-jump never ;\implemented (no exception), but,
jf cop2flg,dest ;-jump always ;/flag seems to be always "false"
GTE Load Delay Slots
Using CFC2/MFC2 has a delay of 1 instruction until the GPR is loaded with its new value.
Certain games are sensitive to this, with the notable example of Tekken 2
which will be filled with broken geometry on emulators which don't emulate this properly.
GTE (memory-?) load and store instructions have a delay of 2 instructions, for
any GTE commands or operations accessing that register. Any? That's wrong!
GTE instructions and functions should not be used in
- Delay slots of jumps and branches
- Event handlers or interrupts (sounds like nonsense?) (need push/pop though)
GTE Command Encoding (COP2 imm25 opcodes)
31-25 Must be 0100101b for "COP2 imm25" instructions
20-24 Fake GTE Command Number (00h..1Fh) (ignored by hardware)
19 sf - Shift Fraction in IR registers (0=No fraction, 1=12bit fraction)
17-18 MVMVA Multiply Matrix (0=Rotation. 1=Light, 2=Color, 3=Reserved)
15-16 MVMVA Multiply Vector (0=V0, 1=V1, 2=V2, 3=IR/long)
13-14 MVMVA Translation Vector (0=TR, 1=BK, 2=FC/Bugged, 3=None)
11-12 Always zero (ignored by hardware)
10 lm - Saturate IR1,IR2,IR3 result (0=To -8000h..+7FFFh, 1=To 0..+7FFFh)
6-9 Always zero (ignored by hardware)
0-5 Real GTE Command Number (00h..3Fh) (used by hardware)
The "sf" and "lm" bits are usually fixed (either set, or cleared, depending on the command) (for MVMVA, the bits are variable) (also, "sf" can be changed for some commands like SQR) (although they are usually fixed for most other opcodes, changing them might have some effect on some/all opcodes)?
GTE Data Register Summary (cop2r0-31)
cop2r0-1 3xS16 VXY0,VZ0 Vector 0 (X,Y,Z)
cop2r2-3 3xS16 VXY1,VZ1 Vector 1 (X,Y,Z)
cop2r4-5 3xS16 VXY2,VZ2 Vector 2 (X,Y,Z)
cop2r6 4xU8 RGBC Color/code value
cop2r7 1xU16 OTZ Average Z value (for Ordering Table)
cop2r8 1xS16 IR0 16bit Accumulator (Interpolate)
cop2r9-11 3xS16 IR1,IR2,IR3 16bit Accumulator (Vector)
cop2r12-15 6xS16 SXY0,SXY1,SXY2,SXYP Screen XY-coordinate FIFO (3 stages)
cop2r16-19 4xU16 SZ0,SZ1,SZ2,SZ3 Screen Z-coordinate FIFO (4 stages)
cop2r20-22 12xU8 RGB0,RGB1,RGB2 Color CRGB-code/color FIFO (3 stages)
cop2r23 4xU8 (RES1) Prohibited
cop2r24 1xS32 MAC0 32bit Maths Accumulators (Value)
cop2r25-27 3xS32 MAC1,MAC2,MAC3 32bit Maths Accumulators (Vector)
cop2r28-29 1xU15 IRGB,ORGB Convert RGB Color (48bit vs 15bit)
cop2r30-31 2xS32 LZCS,LZCR Count Leading-Zeroes/Ones (sign bits)
GTE Control Register Summary (cop2r32-63)
cop2r32-36 9xS16 RT11RT12,..,RT33 Rotation matrix (3x3) ;cnt0-4
cop2r37-39 3x 32 TRX,TRY,TRZ Translation vector (X,Y,Z) ;cnt5-7
cop2r40-44 9xS16 L11L12,..,L33 Light source matrix (3x3) ;cnt8-12
cop2r45-47 3x 32 RBK,GBK,BBK Background color (R,G,B) ;cnt13-15
cop2r48-52 9xS16 LR1LR2,..,LB3 Light color matrix source (3x3) ;cnt16-20
cop2r53-55 3x 32 RFC,GFC,BFC Far color (R,G,B) ;cnt21-23
cop2r56-57 2x 32 OFX,OFY Screen offset (X,Y) ;cnt24-25
cop2r58 BuggyU16 H Projection plane distance. ;cnt26
cop2r59 S16 DQA Depth queing parameter A (coeff) ;cnt27
cop2r60 32 DQB Depth queing parameter B (offset);cnt28
cop2r61-62 2xS16 ZSF3,ZSF4 Average Z scale factors ;cnt29-30
cop2r63 U20 FLAG Returns any calculation errors ;cnt31
GTE Registers
Note in some functions format is different from the one that's given here.
Matrix Registers
Rotation matrix (RT) Light matrix (LLM) Light Color matrix (LCM)
cop2r32.lsbs=RT11 cop2r40.lsbs=L11 cop2r48.lsbs=LR1
cop2r32.msbs=RT12 cop2r40.msbs=L12 cop2r48.msbs=LR2
cop2r33.lsbs=RT13 cop2r41.lsbs=L13 cop2r49.lsbs=LR3
cop2r33.msbs=RT21 cop2r41.msbs=L21 cop2r49.msbs=LG1
cop2r34.lsbs=RT22 cop2r42.lsbs=L22 cop2r50.lsbs=LG2
cop2r34.msbs=RT23 cop2r42.msbs=L23 cop2r50.msbs=LG3
cop2r35.lsbs=RT31 cop2r43.lsbs=L31 cop2r51.lsbs=LB1
cop2r35.msbs=RT32 cop2r43.msbs=L32 cop2r51.msbs=LB2
cop2r36 =RT33 cop2r44 =L33 cop2r52 =LB3
Translation Vector (TR) (Input, R/W?)
cop2r37 (cnt5) - TRX - Translation vector X (R/W?)
cop2r38 (cnt6) - TRY - Translation vector Y (R/W?)
cop2r39 (cnt7) - TRZ - Translation vector Z (R/W?)
Used only for MVMVA, RTPS, RTPT commands.
Background Color (BK) (Input?, R/W?)
cop2r45 (cnt13) - RBK - Background color red component
cop2r46 (cnt14) - GBK - Background color green component
cop2r47 (cnt15) - BBK - Background color blue component
Far Color (FC) (Input?) (R/W?)
cop2r53 (cnt21) - RFC - Far color red component
cop2r54 (cnt22) - GFC - Far color green component
cop2r55 (cnt23) - BFC - Far color blue component
Screen Offset and Distance (Input, R/W?)
cop2r56 (cnt24) - OFX - Screen offset X
cop2r57 (cnt25) - OFY - Screen offset Y
cop2r58 (cnt26) - H - Projection plane distance
cop2r59 (cnt27) - DQA - Depth queing parameter A.(coeff.)
cop2r60 (cnt28) - DQB - Depth queing parameter B.(offset.)
The H value is 16bit unsigned (0bit sign, 16bit integer, 0bit fraction). BUG: When reading the H register, the hardware does accidently \<sign-expand> the \<unsigned> 16bit value (ie. values +8000h..+FFFFh are returned as FFFF8000h..FFFFFFFFh) (this bug applies only to "mov rd,cop2r58" opcodes; the actual calculations via RTPS/RTPT opcodes are working okay).
The DQA value is only 16bit (1bit sign, 7bit integer, 8bit fraction).
The DQB value is 32bit (1bit sign, 7bit integer, 24bit? fraction).
Used only for RTPS/RTPT commands.
Average Z Registers (ZSF3/ZSF4=Input, R/W?) (OTZ=Result, R)
cop2r61 (cnt29) ZSF3 | 0|ZSF3 1,3,12| Z3 average scale factor (normally 1/3)
cop2r62 (cnt30) ZSF4 | 0|ZSF4 1,3,12| Z4 average scale factor (normally 1/4)
cop2r7 OTZ (R) | |OTZ 0,15, 0| Average Z value (for Ordering Table)
Screen XYZ Coordinate FIFOs
cop2r12 - SXY0 rw|SY0 1,15, 0|SX0 1,15, 0| Screen XY fifo (older)
cop2r13 - SXY1 rw|SY1 1,15, 0|SX1 1,15, 0| Screen XY fifo (old)
cop2r14 - SXY2 rw|SY2 1,15, 0|SX2 1,15, 0| Screen XY fifo (new)
cop2r15 - SXYP rw|SYP 1,15, 0|SXP 1,15, 0| SXY2-mirror with move-on-write
cop2r16 - SZ0 rw| 0|SZ0 0,16, 0| Screen Z fifo (oldest)
cop2r17 - SZ1 rw| 0|SZ1 0,16, 0| Screen Z fifo (older)
cop2r18 - SZ2 rw| 0|SZ2 0,16, 0| Screen Z fifo (old)
cop2r19 - SZ3 rw| 0|SZ3 0,16, 0| Screen Z fifo (new)
The SZn Fifo has 4 stages (required for AVSZ4 command), the SXYn Fifo has only 3 stages, and a special mirrored register: SXYP is a mirror of SXY2, the difference is that writing to SXYP moves SXY2/SXY1 to SXY1/SXY0, whilst writing to SXY2 (or any other SXYn or SZn registers) changes only the written register, but doesn't move any other Fifo entries.
16bit Vectors (R/W)
Vector 0 (V0) Vector 1 (V1) Vector 2 (V2) Vector 3 (IR)
cop2r0.lsbs - VX0 cop2r2.lsbs - VX1 cop2r4.lsbs - VX2 cop2r9 - IR1
cop2r0.msbs - VY0 cop2r2.msbs - VY1 cop2r4.msbs - VY2 cop2r10 - IR2
cop2r1 - VZ0 cop2r3 - VZ1 cop2r5 - VZ2 cop2r11 - IR3
Color Register and Color FIFO
cop2r6 - RGBC rw|CODE |B |G |R | Color/code
cop2r20 - RGB0 rw|CD0 |B0 |G0 |R0 | Characteristic color fifo.
cop2r21 - RGB1 rw|CD1 |B1 |G1 |R1 |
cop2r22 - RGB2 rw|CD2 |B2 |G2 |R2 |
cop2r23 - (RES1) | | Prohibited
Interpolation Factor
cop2r8 IR0 rw|Sign |IR0 1, 3,12| Intermediate value 0.
XX...
cop2r24 MAC0 rw|MAC0 1,31,0 | Sum of products value 0
XX...
cop2r25 MAC1 rw|MAC1 1,31,0 | Sum of products value 1
cop2r26 MAC2 rw|MAC2 1,31,0 | Sum of products value 2
cop2r27 MAC3 rw|MAC3 1,31,0 | Sum of products value 3
cop2r28 - IRGB - Color conversion Input (R/W)
Expands 5:5:5 bit RGB (range 0..1Fh) to 16:16:16 bit RGB (range 0000h..0F80h).
0-4 Red (0..1Fh) (R/W) ;multiplied by 80h, and written to IR1
5-9 Green (0..1Fh) (R/W) ;multiplied by 80h, and written to IR2
10-14 Blue (0..1Fh) (R/W) ;multiplied by 80h, and written to IR3
15-31 Not used (always zero) (Read only)
cop2r29 - ORGB - Color conversion Output (R)
Collapses 16:16:16 bit RGB (range 0000h..0F80h) to 5:5:5 bit RGB (range
0..1Fh). Negative values (8000h..FFFFh/80h) are saturated to 00h, large
positive values (1000h..7FFFh/80h) are saturated to 1Fh, there are no overflow
or saturation flags set in cop2r63 though.
0-4 Red (0..1Fh) (R) ;IR1 divided by 80h, saturated to +00h..+1Fh
5-9 Green (0..1Fh) (R) ;IR2 divided by 80h, saturated to +00h..+1Fh
10-14 Blue (0..1Fh) (R) ;IR3 divided by 80h, saturated to +00h..+1Fh
15-31 Not used (always zero) (Read only)
cop2r30 - LZCS - Count Leading Bits Source data (R/W)
cop2r31 - LZCR - Count Leading Bits Result (R)
Reading LZCR returns the leading 0 count of LZCS if LZCS is positive and the
leading 1 count of LZCS if LZCS is negative. The results are in range 1..32.
cop2r63 (cnt31) - FLAG - Returns any calculation errors.
See GTE Saturation chapter.
GTE Saturation
Maths overflows are indicated in FLAG register. In most cases, the result is
saturated to MIN/MAX values (except MAC0,MAC1,MAC2,MAC3 which aren't
saturated). For IR1,IR2,IR3 many commands allow to select the MIN value via
"lm" bit of the GTE opcode (though not all commands, RTPS/RTPT always act as if
lm=0).
cop2r63 (cnt31) - FLAG - Returns any calculation errors.
31 Error Flag (Bit30..23, and 18..13 ORed together) (Read only)
30 MAC1 Result larger than 43 bits and positive
29 MAC2 Result larger than 43 bits and positive
28 MAC3 Result larger than 43 bits and positive
27 MAC1 Result larger than 43 bits and negative
26 MAC2 Result larger than 43 bits and negative
25 MAC3 Result larger than 43 bits and negative
24 IR1 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
23 IR2 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
22 IR3 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0)
21 Color-FIFO-R saturated to +00h..+FFh
20 Color-FIFO-G saturated to +00h..+FFh
19 Color-FIFO-B saturated to +00h..+FFh
18 SZ3 or OTZ saturated to +0000h..+FFFFh
17 Divide overflow. RTPS/RTPT division result saturated to max=1FFFFh
16 MAC0 Result larger than 31 bits and positive
15 MAC0 Result larger than 31 bits and negative
14 SX2 saturated to -0400h..+03FFh
13 SY2 saturated to -0400h..+03FFh
12 IR0 saturated to +0000h..+1000h
0-11 Not used (always zero) (Read only)
Bit31 is apparently intended for RTPS/RTPT commands, since it triggers only on flags that are affected by these two commands, but even for that commands it's totally useless since one could as well check if FLAG is nonzero.
Note: Writing 32bit values to 16bit GTE registers by software does not trigger any overflow/saturation flags (and does not do any saturation), eg. writing 12008900h (positive 32bit) to a signed 16bit register sets that register to FFFF8900h (negative 16bit).
GTE Opcode Summary
GTE Command Summary (sorted by Real Opcode bits) (bit0-5)
Opc Name Clk Expl.
00h - N/A (modifies similar registers than RTPS...)
01h RTPS 15 Perspective Transformation single
0xh - N/A
06h NCLIP 8 Normal clipping
0xh - N/A
0Ch OP(sf) 6 Cross product of 2 vectors
0xh - N/A
10h DPCS 8 Depth Cueing single
11h INTPL 8 Interpolation of a vector and far color vector
12h MVMVA 8 Multiply vector by matrix and add vector (see below)
13h NCDS 19 Normal color depth cue single vector
14h CDP 13 Color Depth Que
15h - N/A
16h NCDT 44 Normal color depth cue triple vectors
1xh - N/A
1Bh NCCS 17 Normal Color Color single vector
1Ch CC 11 Color Color
1Dh - N/A
1Eh NCS 14 Normal color single
1Fh - N/A
20h NCT 30 Normal color triple
2xh - N/A
28h SQR(sf)5 Square of vector IR
29h DCPL 8 Depth Cue Color light
2Ah DPCT 17 Depth Cueing triple (should be fake=08h, but isn't)
2xh - N/A
2Dh AVSZ3 5 Average of three Z values
2Eh AVSZ4 6 Average of four Z values
2Fh - N/A
30h RTPT 23 Perspective Transformation triple
3xh - N/A
3Dh GPF(sf)5 General purpose interpolation
3Eh GPL(sf)5 General purpose interpolation with base
3Fh NCCT 39 Normal Color Color triple vector
GTE Command Summary (sorted by Fake Opcode bits) (bit20-24)
The fake opcode number in bit20-24 has absolutely no effect on the hardware, it
seems to be solely used to (or not to) confuse developers. Having the opcodes
sorted by their fake numbers gives a more or less well arranged list:
Fake Name Clk Expl.
00h - N/A
01h RTPS 15 Perspective Transformation single
02h RTPT 23 Perspective Transformation triple
03h - N/A
04h MVMVA 8 Multiply vector by matrix and add vector (see below)
05h - N/A
06h DCPL 8 Depth Cue Color light
07h DPCS 8 Depth Cueing single
08h DPCT 17 Depth Cueing triple (should be fake=08h, but isn't)
09h INTPL 8 Interpolation of a vector and far color vector
0Ah SQR(sf)5 Square of vector IR
0Bh - N/A
0Ch NCS 14 Normal color single
0Dh NCT 30 Normal color triple
0Eh NCDS 19 Normal color depth cue single vector
0Fh NCDT 44 Normal color depth cue triple vectors
10h NCCS 17 Normal Color Color single vector
11h NCCT 39 Normal Color Color triple vector
12h CDP 13 Color Depth Que
13h CC 11 Color Color
14h NCLIP 8 Normal clipping
15h AVSZ3 5 Average of three Z values
16h AVSZ4 6 Average of four Z values
17h OP(sf) 6 Cross product of 2 vectors
18h - N/A
19h GPF(sf)5 General purpose interpolation
1Ah GPL(sf)5 General purpose interpolation with base
1Bh - N/A
1Ch - N/A
1Dh - N/A
1Eh - N/A
1Fh - N/A
Additional Functions
The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
These registers work without needing to send any COP2 commands. However, unlike
for commands (which do automatically halt the CPU when needed), one must insert
dummy opcodes between writing and reading the registers.
GTE Coordinate Calculation Commands
COP2 0180001h - 15 Cycles - RTPS - Perspective Transformation (single)
COP2 0280030h - 23 Cycles - RTPT - Perspective Transformation (triple)
RTPS performs final Rotate, translate and perspective transformation on vertex
V0. Before writing to the FIFOs, the older entries are moved one stage down.
RTPT is same as RTPS, but repeats for V1 and V2. The "sf" bit should be usually
set.
IR1 = MAC1 = (TRX*1000h + RT11*VX0 + RT12*VY0 + RT13*VZ0) SAR (sf*12)
IR2 = MAC2 = (TRY*1000h + RT21*VX0 + RT22*VY0 + RT23*VZ0) SAR (sf*12)
IR3 = MAC3 = (TRZ*1000h + RT31*VX0 + RT32*VY0 + RT33*VZ0) SAR (sf*12)
SZ3 = MAC3 SAR ((1-sf)*12) ;ScreenZ FIFO 0..+FFFFh
MAC0=(((H*20000h/SZ3)+1)/2)*IR1+OFX, SX2=MAC0/10000h ;ScrX FIFO -400h..+3FFh
MAC0=(((H*20000h/SZ3)+1)/2)*IR2+OFY, SY2=MAC0/10000h ;ScrY FIFO -400h..+3FFh
MAC0=(((H*20000h/SZ3)+1)/2)*DQA+DQB, IR0=MAC0/1000h ;Depth cueing 0..+1000h
GTE Division Inaccuracy
For "far plane clipping", one can use the SZ3 saturation flag (MaxZ=FFFFh), or the IR3 saturation flag (MaxZ=7FFFh) (eg. used by Wipeout 2097), or one can compare the SZ3 value with any desired MaxZ value by software.
Note: The command does saturate IR1,IR2,IR3 to -8000h..+7FFFh (regardless of lm bit). When using RTP with sf=0, then the IR3 saturation flag (FLAG.22) gets set \<only> if "MAC3 SAR 12" exceeds -8000h..+7FFFh (although IR3 is saturated when "MAC3" exceeds -8000h..+7FFFh).
COP2 1400006h - 8 Cycles - NCLIP - Normal clipping
MAC0 = SX0*SY1 + SX1*SY2 + SX2*SY0 - SX0*SY2 - SX1*SY0 - SX2*SY1
COP2 158002Dh - 5 Cycles - AVSZ3 - Average of three Z values (for Triangles)
COP2 168002Eh - 6 Cycles - AVSZ4 - Average of four Z values (for Quads)
MAC0 = ZSF3*(SZ1+SZ2+SZ3) ;for AVSZ3
MAC0 = ZSF4*(SZ0+SZ1+SZ2+SZ3) ;for AVSZ4
OTZ = MAC0/1000h ;for both (saturated to 0..FFFFh)
GPU Depth Ordering
The scaling factors would be usually ZSF3=N/30h and ZSF4=N/40h, where "N" is the number of entries in the OT (max 10000h). SZn and OTZ are unsigned 16bit values, for whatever reason ZSFn registers are signed 16bit values (negative values would allow a negative result in MAC0, but would saturate OTZ to zero).
GTE General Purpose Calculation Commands
COP2 0400012h - 8 Cycles - MVMVA(sf,mx,v,cv,lm)
Multiply vector by matrix and vector addition.
Mx = matrix specified by mx ;RT/LLM/LCM - Rotation, light or color matrix
Vx = vector specified by v ;V0, V1, V2, or [IR1,IR2,IR3]
Tx = translation vector specified by cv ;TR or BK or Bugged/FC, or None
MAC1 = (Tx1*1000h + Mx11*Vx1 + Mx12*Vx2 + Mx13*Vx3) SAR (sf*12)
MAC2 = (Tx2*1000h + Mx21*Vx1 + Mx22*Vx2 + Mx23*Vx3) SAR (sf*12)
MAC3 = (Tx3*1000h + Mx31*Vx1 + Mx32*Vx2 + Mx33*Vx3) SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
The GTE also allows selection of the far color vector (FC), but this vector is not added correctly by the hardware: The return values are reduced to the last portion of the formula, ie. MAC1=(Mx13*Vx3) SAR (sf*12), and similar for MAC2 and MAC3, nethertheless, some bits in the FLAG register seem to be adjusted as if the full operation would have been executed. Setting Mx=3 selects a garbage matrix (with elements -60h, +60h, IR0, RT13, RT13, RT13, RT22, RT22, RT22).
COP2 0A00428h+sf*80000h - 5 Cycles - SQR(sf) - Square vector
[MAC1,MAC2,MAC3] = [IR1*IR1,IR2*IR2,IR3*IR3] SHR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;IR1,IR2,IR3 saturated to max 7FFFh
COP2 170000Ch+sf*80000h - 6 Cycles - OP(sf,lm) - Cross product of 2 vectors
[MAC1,MAC2,MAC3] = [IR3*D2-IR2*D3, IR1*D3-IR3*D1, IR2*D1-IR1*D2] SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;copy result
The official Sony documentation refers to this opcode as the Outer Product, but this is likely the result of a bad translation from Japanese: "外積 - gaiseki" can be translated to "cross product", "vector product", or "outer product".
LZCS/LZCR registers - ? Cycles - Count-Leading-Zeroes/Leading-Ones
The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
GTE Color Calculation Commands
COP2 0C8041Eh - 14 Cycles - NCS - Normal color (single)
COP2 0D80420h - 30 Cycles - NCT - Normal color (triple)
COP2 108041Bh - 17 Cycles - NCCS - Normal Color Color (single vector)
COP2 118043Fh - 39 Cycles - NCCT - Normal Color Color (triple vector)
COP2 0E80413h - 19 Cycles - NCDS - Normal color depth cue (single vector)
COP2 0F80416h - 44 Cycles - NCDT - Normal color depth cue (triple vectors)
In: V0=Normal vector (for triple variants repeated with V1 and V2),
BK=Background color, RGBC=Primary color/code, LLM=Light matrix, LCM=Color
matrix, IR0=Interpolation value.
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (LLM*V0) SAR (sf*12)
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12)
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for NCDx/NCCx
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for NCDx only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12) ;<--- for NCDx/NCCx
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
COP2 138041Ch - 11 Cycles - CC(lm=1) - Color Color
COP2 1280414h - 13 Cycles - CDP(...) - Color Depth Que
In: [IR1,IR2,IR3]=Vector, RGBC=Primary color/code, LCM=Color matrix,
BK=Background color, and, for CDP, IR0=Interpolation value, FC=Far color.
[IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12)
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for CDP only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
COP2 0680029h - 8 Cycles - DCPL - Depth Cue Color light
COP2 0780010h - 8 Cycles - DPCS - Depth Cueing (single)
COP2 0x8002Ah - 17 Cycles - DPCT - Depth Cueing (triple)
COP2 0980011h - 8 Cycles - INTPL - Interpolation of a vector and far color
In: [IR1,IR2,IR3]=Vector, FC=Far Color, IR0=Interpolation value, CODE=MSB of
RGBC, and, for DCPL, R,G,B=LSBs of RGBC.
[MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for DCPL only
[MAC1,MAC2,MAC3] = [IR1,IR2,IR3] SHL 12 ;<--- for INTPL only
[MAC1,MAC2,MAC3] = [R,G,B] SHL 16 ;<--- for DPCS/DPCT
[MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
COP2 190003Dh - 5 Cycles - GPF(sf,lm) - General purpose Interpolation
COP2 1A0003Eh - 5 Cycles - GPL(sf,?) - General Interpolation with base
[MAC1,MAC2,MAC3] = [0,0,0] ;<--- for GPF only
[MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SHL (sf*12) ;<--- for GPL only
[MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3]) SAR (sf*12)
Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3]
Details on "MAC+(FC-MAC)*IR0"
[IR1,IR2,IR3] = (([RFC,GFC,BFC] SHL 12) - [MAC1,MAC2,MAC3]) SAR (sf*12)
[MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3])
Details on "(LLM*V0) SAR (sf*12)" and "(BK*1000h + LCM*IR) SAR (sf*12)"
Works like MVMVA command (see there), but with fixed Tx/Vx/Mx parameters, the
sf/lm bits can be changed and do affect the results (although normally both
bits should be set for use with color matrices).
Notes
The 8bit RGB values written to the top of Color Fifo are the 32bit MACn values
divided by 16, and saturated to +00h..+FFh, and of course, the older Fifo
entries are moved downwards. Note that, at the GPU side, the meaning of the RGB
values depends on whether or not texture blending is used (for untextured
polygons FFh is max brightness) (for texture blending FFh is double brightness
and 80h is normal brightness).
The 8bit CODE value is intended to contain a GP0(20h..7Fh) Rendering command,
allowing to automatically merge the 8bit command number, with the 24bit color
value.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
Although the result of the commands in this chapter is written to the Color
FIFO, some commands like GPF/GPL may be also used for other purposes (eg. to
scale or scale/translate single vertices).
GTE Division Inaccuracy
GTE Division Inaccuracy (for RTPS/RTPT commands)
Basically, the GTE division does (attempt to) work as so (using 33bit maths):
n = (((H*20000h/SZ3)+1)/2)
n = ((H*10000h+SZ3/2)/SZ3)
if n>1FFFFh or division_by_zero then n=1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1
if (H < SZ3*2) then ;check if overflow
z = count_leading_zeroes(SZ3) ;z=0..0Fh (for 16bit SZ3)
n = (H SHL z) ;n=0..7FFF8000h
d = (SZ3 SHL z) ;d=8000h..FFFFh
u = unr_table[(d-7FC0h) SHR 7] + 101h ;u=200h..101h
d = ((2000080h - (d * u)) SHR 8) ;d=10000h..0FF01h
d = ((0000080h + (d * u)) SHR 8) ;d=20000h..10000h
n = min(1FFFFh, (((n*d) + 8000h) SHR 16)) ;n=0..1FFFFh
else n = 1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1 ;n=1FFFFh plus overflow flag
FFh,FDh,FBh,F9h,F7h,F5h,F3h,F1h,EFh,EEh,ECh,EAh,E8h,E6h,E4h,E3h ;\
E1h,DFh,DDh,DCh,DAh,D8h,D6h,D5h,D3h,D1h,D0h,CEh,CDh,CBh,C9h,C8h ; 00h..3Fh
C6h,C5h,C3h,C1h,C0h,BEh,BDh,BBh,BAh,B8h,B7h,B5h,B4h,B2h,B1h,B0h ;
AEh,ADh,ABh,AAh,A9h,A7h,A6h,A4h,A3h,A2h,A0h,9Fh,9Eh,9Ch,9Bh,9Ah ;/
99h,97h,96h,95h,94h,92h,91h,90h,8Fh,8Dh,8Ch,8Bh,8Ah,89h,87h,86h ;\
85h,84h,83h,82h,81h,7Fh,7Eh,7Dh,7Ch,7Bh,7Ah,79h,78h,77h,75h,74h ; 40h..7Fh
73h,72h,71h,70h,6Fh,6Eh,6Dh,6Ch,6Bh,6Ah,69h,68h,67h,66h,65h,64h ;
63h,62h,61h,60h,5Fh,5Eh,5Dh,5Dh,5Ch,5Bh,5Ah,59h,58h,57h,56h,55h ;/
54h,53h,53h,52h,51h,50h,4Fh,4Eh,4Dh,4Dh,4Ch,4Bh,4Ah,49h,48h,48h ;\
47h,46h,45h,44h,43h,43h,42h,41h,40h,3Fh,3Fh,3Eh,3Dh,3Ch,3Ch,3Bh ; 80h..BFh
3Ah,39h,39h,38h,37h,36h,36h,35h,34h,33h,33h,32h,31h,31h,30h,2Fh ;
2Eh,2Eh,2Dh,2Ch,2Ch,2Bh,2Ah,2Ah,29h,28h,28h,27h,26h,26h,25h,24h ;/
24h,23h,22h,22h,21h,20h,20h,1Fh,1Eh,1Eh,1Dh,1Dh,1Ch,1Bh,1Bh,1Ah ;\
19h,19h,18h,18h,17h,16h,16h,15h,15h,14h,14h,13h,12h,12h,11h,11h ; C0h..FFh
10h,0Fh,0Fh,0Eh,0Eh,0Dh,0Dh,0Ch,0Ch,0Bh,0Ah,0Ah,09h,09h,08h,08h ;
07h,07h,06h,06h,05h,05h,04h,04h,03h,03h,02h,02h,01h,01h,00h,00h ;/
00h ;<-- one extra table entry (for "(d-7FC0h)/80h"=100h) ;-100h
Some special cases: NNNNh/0001h uses a big multiplier (d=20000h), in practice, this can occur only for 0000h/0001h and 0001h/0001h (due to the H\<SZ3*2 overflow check).
The min(1FFFFh) limit is needed for cases like FE3Fh/7F20h, F015h/780Bh, etc. (these do produce UNR result 20000h, and are saturated to 1FFFFh, but without setting overflow FLAG bits).