Graphics Processing Unit (GPU)
The GPU can render Polygons, Lines, or Rectangles to the Drawing Buffer, and
sends the Display Buffer to the Television Set. Polygons are useful for 3D
graphics (or rotated/scaled 2D graphics), Rectangles are useful for 2D graphics
and Text output.
GPU I/O Ports, DMA Channels, Commands, VRAM
GPU Render Polygon Commands
GPU Render Line Commands
GPU Render Rectangle Commands
GPU Rendering Attributes
GPU Memory Transfer Commands
GPU Other Commands
GPU Display Control Commands (GP1)
GPU Status Register
GPU Versions
GPU Depth Ordering
GPU Video Memory (VRAM)
GPU Texture Caching
GPU Timings
GPU (MISC)
GPU I/O Ports, DMA Channels, Commands, VRAM
GPU I/O Ports (1F801810h and 1F801814h in Read/Write Directions)
Port Name Expl.
1F801810h-Write GP0 Send GP0 Commands/Packets (Rendering and VRAM Access)
1F801814h-Write GP1 Send GP1 Commands (Display Control) (and DMA Control)
1F801810h-Read GPUREAD Receive responses to GP0(C0h) and GP1(10h) commands
1F801814h-Read GPUSTAT Receive GPU Status Register
Optionally, Port 1F801810h (Read/Write) can be also accessed via DMA2.
The communication between the CPU and the GPU is a 32-bits data-only bus called the VBUS. Aside from address line 2 being connected, in order to make the difference between port 0 and 1, there are no other address line between the two chips.
Thus the GPU can be seen as a blackbox that executes 32 bits commands.
GPU Timers / Synchronization
Most of the Timers are bound to GPU timings, see
Timers
Interrupts
GPU-related DMA Channels (DMA2 and DMA6)
Channel Recommended for
DMA2 in Linked Mode - Sending rendering commands ;GP0(20h..7Fh,E1h..E6h)
DMA2 in Continuous Mode - VRAM transfers to/from GPU ;GP0(A0h,C0h)
DMA6 - Initializing the Link List ;Main RAM
DMA2 is equivalent to accessing Port 1F801810h (GP0/GPUREAD) by software.
DMA6 just initializes data in Main RAM (not physically connected to the GPU).
GPU Command Summary
While it is probably more simple for the MIPS software to see GPU commands
as a collection of bytes, the GPU will only see 32 bits words being sent to it.
Therefore, while the Sony libraries will fill up structures to send to the GPU
using byte-level granularity, it is much more simple to see these as bitmasks
from the GPU's point of view.
So when processing commands on GP0, the GPU will first inspect the top 3 bits
of the 32 bits command being sent. Depending on the value of these 3 bits,
further decoding of the other bits can be done.
Commands sent to GP1 are more simple in nature to decode.
Top 3 bits of a GP0 command:
0 (000) Misc commands
1 (001) Polygon primitive
2 (010) Line primitive
3 (011) Rectangle primitive
4 (100) VRAM-to-VRAM blit
5 (101) CPU-to-VRAM blit
6 (110) VRAM-to-CPU blit
7 (111) Environment commands
The astute reader will realize that there are shared bits between primitives, such as the gouraud shading flag.
Unlike all the others, the environment commands are more clear to be seen as a single 8 bits command, therefore the rest of the document will refer to them by their full 8 bits value.
Clear Cache
1st Command (01000000h)
Note that it is possible to abuse the texture cache by changing pixels in VRAM that the GPU loaded in its cache, therefore creating weird drawing effects, but this is only seen in some demos, and never in actual games.
Quick Rectangle Fill
1st Color+Command (02BbGgRrh) ;24bit RGB value (see note)
2nd Top Left Corner (YyyyXxxxh) ;Xpos counted in halfwords, steps of 10h
3rd Width+Height (YsizXsizh) ;Xsiz counted in halfwords, steps of 10h
The "Color" parameter is a 24bit RGB value, however, the actual fill data is 16bit: The hardware linearly converts the 24bit RGB value to 15bit RGB by dropping the lower 3 bits of each color value and additionally sets the mask bit (bit15) to 0.
Rectangle filling is not affected by the GP0(E6h) mask setting, acting as if GP0(E6h).0 and GP0(E6h).1 are both zero.
This command is typically used to do a quick clear, as it'll be faster to run than an equivalent Render Rectangle command.
VRAM Overview / VRAM Addressing
VRAM can be 1 MB or 2 MB (not mapped to the CPU bus) (it can be read/written
only via I/O or DMA). The memory is used for:
Framebuffer(s) ;Usually 2 buffers (Drawing Area, and Display Area)
Texture Page(s) ;Required when using Textures
Texture Palette(s) ;Required when using 4bit/8bit Textures
Unit = 4bit 8bit 16bit 24bit Halfwords | Unit = Lines
Width = 4096 2048 1024 682.66 1024 | Height = 512/1024
GPU Render Polygon Commands
When the upper 3 bits of the first GP0 command are set to 1 (001), then the command can be decoded using the following bitfield:
bit number value meaning
31-29 001 polygon render
28 1/0 gouraud / flat shading
27 1/0 4 / 3 vertices
26 1/0 textured / untextured
25 1/0 semi-transparent / opaque
24 1/0 raw texture / modulation
23-0 rgb first color value.
Subsequent data sent to GP0 to complete this command will be the vertex data for the command. The meaning and count of these words will be altered by the initial flags sent in the first command.
If doing flat rendering, no further color will be sent. If doing gouraud shading, there will be one more color per vertex sent, and the initial color will be the one for vertex 0.
If doing textured rendering, each vertex sent will also have a U/V texture coordinate attached to it, as well as a CLUT index.
So each vertex data can be seen as the following set of words:
Color xxBBGGRR - optional, only present for gouraud shading
Vertex YYYYXXXX - required, two signed 16 bits values
UV ClutVVUU or PageVVUU - optional, only present for textured polygons
The upper 16 bits of the first two UV words contain extra information. The first word holds the Clut index. The second word contains texture page information. Any further clut/page bits should be set to 0.
So for example, a solid flat blue triangle of coordinate (10, 20), (30, 40), (50, 60) will be drawn using the following draw call data:
200000FF
00100020
00300040
00500060
And a quad with gouraud shading texture-blend will have the following structure:
2CR1G1B1
Yyy1Xxx1
ClutV1U1
00R2G2B2
Yyy2Xxx2
PageV2U2
00R3G3B3
Yyy3Xxx3
0000V3U3
00R4G4B4
Yyy4Xxx4
0000V4U4
Some combination of these flags can be seen as nonsense however, but it's important to realize that the GPU will still process them properly. For instance, specifying gouraud shading without modulation will force the user to send the colors for each vertex to satisfy the GPU's state machine, without them being actually used for the rendering.
Notes
Polygons are displayed up to \<excluding> their lower-right coordinates.
Quads are internally processed as two triangles, the
first consisting of vertices 1,2,3, and the second of vertices 2,3,4. This is an
important detail, as splitting the quad into triangles affects the way colours
are interpolated.
Within the triangle, the ordering of the vertices doesn't matter on
the GPU side (a front-back check, based on clockwise or anti-clockwise
ordering, can be implemented at the GTE side).
Dither enable (in Texpage command) affects ONLY polygons that do use gouraud
shading or modulation.
GPU Render Line Commands
When the upper 3 bits of the first GP0 command are set to 2 (010), then the command can be decoded using the following bitfield:
bit number value meaning
31-29 010 line render
28 1/0 gouraud / flat shading
27 1/0 polyline / single line
25 1/0 semi-transparent / opaque
23-0 rgb first color value.
So each vertex can be seen as the following list of words:
Color xxBBGGRR - optional, only present for gouraud shading
Vertex YYYYXXXX - required, two signed 16 bits values
When polyline mode is active, at least two vertices must be sent to the GPU.
The vertex list is terminated by the bits 12-15 and 28-31 equaling 0x5
, or
(word & 0xF000F000) == 0x50005000
. The terminator value occurs on the first
word of the vertex (i.e. the color word if it's a gouraud shaded).
If the 2 vertices in a line overlap, then the GPU will draw a 1x1 rectangle in
the location of the 2 vertices using the colour of the first vertex.
Note
Lines are displayed up to \<including> their lower-right coordinates (ie.
unlike as for polygons, the lower-right coordinate is not excluded).
If dithering is enabled (via Texpage command), then both monochrome and shaded
lines are drawn with dithering (this differs from monochrome polygons and
monochrome rectangles).
Wire-Frame
Poly-Lines can be used (among others) to create Wire-Frame polygons (by setting
the last Vertex equal to Vertex 1).
GPU Render Rectangle Commands
Rectangles are drawn much faster than polygons. Unlike polygons, gouraud
shading is not possible, dithering isn't applied, the rectangle must forcefully
have horizontal and vertical edges, textures cannot be rotated or scaled, and,
of course, the GPU does render Rectangles as a single entity, without splitting
them into two triangles.
The Rectangle command can be decoded using the following bitfield:
bit number value meaning
31-29 011 rectangle render
28-27 sss rectangle size
26 1/0 textured / untextured
25 1/0 semi-transparent / opaque
24 1/0 raw texture / modulation
23-0 rgb first color value.
The size
parameter can be seen as the following enum:
0 (00) variable size
1 (01) single pixel (1x1)
2 (10) 8x8 sprite
3 (11) 16x16 sprite
Therefore, the whole draw call can be seen as the following sequence of words:
Color ccBBGGRR - command + color; color is ignored when textured
Vertex1 YYYYXXXX - required, indicates the upper left corner to render
UV ClutVVUU - optional, only present for textured rectangles
Width+Height YsizXsiz - optional, dimensions for variable sized rectangles (max 1023x511)
Unlike for Textured-Polygons, the "Texpage" must be set up separately for
Rectangles, via GP0(E1h). Width and Height can be up to 1023x511, however, the
maximum size of the texture window is 256x256 (so the source data will be
repeated when trying to use sizes larger than 256x256).
Texture Origin and X/Y-Flip
Vertex & Texcoord specify the upper-left edge of the rectangle. And,
normally, screen coords and texture coords are both incremented during
rendering the rectangle pixels.
Optionally, X/Y-Flip bits can be set in Texpage.Bit12/13, these bits cause the
texture coordinates to be decremented (instead of incremented). The X/Y-Flip
bits do affect only Rectangles (not Polygons, nor VRAM Transfers).
Caution: Reportedly, the X/Y-Flip feature isn't supported on old PSX consoles
(unknown which ones exactly, maybe such with PU-7 mainboards, and unknown how
to detect flipping support; except of course by reading VRAM).
Note
There are also two VRAM Transfer commands which work similar to GP0(60h) and
GP0(65h). Eventually, that commands might be even faster... although not sure
if they do use the Texture Cache?
The difference is that VRAM Transfers do not clip to the Drawig Area boundary,
do not support fully-transparent nor semi-transparent texture pixels, and do
not convert color depths (eg. without 4bit texture to 16bit framebuffer
conversion).
GPU Rendering Attributes
Vertex (Parameter for Polygon, Line, Rectangle commands)
0-10 X-coordinate (signed, -1024..+1023)
11-15 Not used (usually sign-extension, but ignored by hardware)
16-26 Y-coordinate (signed, -1024..+1023)
26-31 Not used (usually sign-extension, but ignored by hardware)
If portions of the polygon/line/rectangle are located outside of the drawing area, then the hardware renders only the portion that is inside of the drawing area. Not sure if the hardware is skipping all clipped pixels at once (within a single clock cycle), or if it's (slowly) processing them pixel by pixel?
Color Attribute (Parameter for all Rendering commands, except Raw Texture)
0-7 Red (0..FFh)
8-15 Green (0..FFh)
16-23 Blue (0..FFh)
24-31 Command (in first paramter) (don't care in further parameters)
Texpage Attribute (Parameter for Textured-Polygons commands)
0-8 Same as GP0(E1h).Bit0-8 (see there)
9-10 Unused (does NOT change GP0(E1h).Bit9-10)
11 Same as GP0(E1h).Bit11 (see there)
12-13 Unused (does NOT change GP0(E1h).Bit12-13)
14-15 Unused (should be 0)
Clut Attribute (Color Lookup Table, aka Palette)
This attribute is used in all Textured Polygon/Rectangle commands. Of course,
it's relevant only for 4bit/8bit textures (don't care for 15bit textures).
0-5 X coordinate X/16 (ie. in 16-halfword steps)
6-14 Y coordinate 0-511 (ie. in 1-line steps) ;\on v0 GPU (max 1 MB VRAM)
15 Unused (should be 0) ;/
6-15 Y coordinate 0-1023 (ie. in 1-line steps) ;on v2 GPU (max 2 MB VRAM)
GP0(E1h) - Draw Mode setting (aka "Texpage")
0-3 Texture page X Base (N*64) (ie. in 64-halfword steps) ;GPUSTAT.0-3
4 Texture page Y Base 1 (N*256) (ie. 0, 256, 512 or 768) ;GPUSTAT.4
5-6 Semi-transparency (0=B/2+F/2, 1=B+F, 2=B-F, 3=B+F/4) ;GPUSTAT.5-6
7-8 Texture page colors (0=4bit, 1=8bit, 2=15bit, 3=Reserved);GPUSTAT.7-8
9 Dither 24bit to 15bit (0=Off/strip LSBs, 1=Dither Enabled) ;GPUSTAT.9
10 Drawing to display area (0=Prohibited, 1=Allowed) ;GPUSTAT.10
11 Texture page Y Base 2 (N*512) (only for 2 MB VRAM) ;GPUSTAT.15
12 Textured Rectangle X-Flip (BIOS does set this bit on power-up...?)
13 Textured Rectangle Y-Flip (BIOS does set it equal to GPUSTAT.13...?)
14-23 Not used (should be 0)
24-31 Command (E1h)
Texture page colors setting 3 (reserved) is same as setting 2 (15bit).
Bits 4 and 11 are the LSB and MSB of the 2-bit texture page Y coordinate. Normally only bit 4 is used as retail consoles only have 1 MB VRAM. Setting bit 11 (Y>=512) on a retail console with a v2 GPU will result in textures disappearing if 2 MB VRAM support was previously enabled using GP1(09h), as the VRAM chip select will no longer be active. Bit 11 is always ignored by v0 GPUs that do not support 2 MB VRAM.
Note: GP0(00h) seems to be often inserted between Texpage and Rectangle commands, maybe it acts as a NOP, which may be required between that commands, for timing reasons...?
GP0(E2h) - Texture Window setting
0-4 Texture window Mask X (in 8 pixel steps)
5-9 Texture window Mask Y (in 8 pixel steps)
10-14 Texture window Offset X (in 8 pixel steps)
15-19 Texture window Offset Y (in 8 pixel steps)
20-23 Not used (zero)
24-31 Command (E2h)
Texcoord = (Texcoord AND ((Mask*8)-1)) OR ((Offset AND Mask)*8)
GP0(E3h) - Set Drawing Area top left (X1,Y1)
GP0(E4h) - Set Drawing Area bottom right (X2,Y2)
0-9 X-coordinate (0..1023)
10-18 Y-coordinate (0..511) ;\on v0 GPU (max 1 MB VRAM)
19-23 Not used (zero) ;/
10-19 Y-coordinate (0..1023) ;\on v2 GPU (max 2 MB VRAM)
20-23 Not used (zero) ;/
24-31 Command (Exh)
GP0(E5h) - Set Drawing Offset (X,Y)
0-10 X-offset (-1024..+1023) (usually within X1,X2 of Drawing Area)
11-21 Y-offset (-1024..+1023) (usually within Y1,Y2 of Drawing Area)
22-23 Not used (zero)
24-31 Command (E5h)
GP0(E6h) - Mask Bit Setting
0 Set mask while drawing (0=TextureBit15, 1=ForceBit15=1) ;GPUSTAT.11
1 Check mask before draw (0=Draw Always, 1=Draw if Bit15=0) ;GPUSTAT.12
2-23 Not used (zero)
24-31 Command (E6h)
When bit1 is on, any (old) pixels in the framebuffer with bit15=1 are write-protected, and cannot be overwritten by (new) rendering commands.
The mask setting affects all rendering commands, as well as CPU-to-VRAM and VRAM-to-VRAM transfer commands (where it acts on the separate halfwords, ie. as for 15bit textures). However, Mask does NOT affect the Fill-VRAM command.
This setting is used in games such as Metal Gear Solid and Silent Hill.
Note
GP0(E3h..E5h) do not take up space in the FIFO, so they are probably executed
immediately (even if there're still other commands in the FIFO). Best use them
only if you are sure that the FIFO is empty (otherwise the new Drawing Area
settings might accidentally affect older Rendering Commands in the FIFO).
GPU Memory Transfer Commands
The next three commands being described are when the high 3 bits are set to the values 4 (100), 5 (101), and 6 (110). For them, the remaining 29 bits are ignored, and can be set to any arbitrary value.
VRAM to VRAM blitting - command 4 (100)
1st Command
2nd Source Coord (YyyyXxxxh) ;Xpos counted in halfwords
3rd Destination Coord (YyyyXxxxh) ;Xpos counted in halfwords
4th Width+Height (YsizXsizh) ;Xsiz counted in halfwords
CPU to VRAM blitting - command 5 (101)
1st Command
2nd Destination Coord (YyyyXxxxh) ;Xpos counted in halfwords
3rd Width+Height (YsizXsizh) ;Xsiz counted in halfwords
... Data (...) <--- usually transferred via DMA
VRAM to CPU blitting - command 6 (110)
1st Command ;\
2nd Source Coord (YyyyXxxxh) ; write to GP0 port (as usually)
3rd Width+Height (YsizXsizh) ;/
... Data (...) ;<--- read from GPUREAD port (or via DMA)
Masking and Rounding for FILL Command parameters
Xpos=(Xpos AND 3F0h) ;range 0..3F0h, in steps of 10h
Ypos=(Ypos AND 1FFh) ;range 0..1FFh
Xsiz=((Xsiz AND 3FFh)+0Fh) AND (NOT 0Fh) ;range 0..400h, in steps of 10h
Ysiz=((Ysiz AND 1FFh)) ;range 0..1FFh
Note that because of the height (Ysiz) masking, a maximum of 511 rows can be filled in a single command. Calling a fill with a full VRAM height of 512 rows will be ineffective as the height will be masked to 0.
Masking for COPY Commands parameters
Xpos=(Xpos AND 3FFh) ;range 0..3FFh
Ypos=(Ypos AND 1FFh) ;range 0..1FFh
Xsiz=((Xsiz-1) AND 3FFh)+1 ;range 1..400h
Ysiz=((Ysiz-1) AND 1FFh)+1 ;range 1..200h
Notes
The coordinates for the above VRAM transfer commands are absolute framebuffer
addresses (not relative to Draw Offset, and not clipped to Draw Area).
Non-DMA transfers seem to be working at any time, but GPU-DMA Transfers seem to
be working ONLY during V-Blank (outside of V-Blank, portions of the data appear
to be skipped, and the following words arrive at wrong addresses), unknown if
it's possible to change that by whatever configuration settings...? That
problem appears ONLY for continous DMA aka VRAM transfers (linked-list DMA aka
Ordering Table works even outside V-Blank).
Wrapping
If the Source/Dest starting points plus the width/height value exceed the
1024x512 pixel VRAM size, then the Copy/Fill operations wrap to the opposite
memory edge (without any carry-out from X to Y, nor from Y to X).
GPU Other Commands
GP0(1Fh) - Interrupt Request (IRQ1)
1st Command (Cc000000h) ;GPUSTAT.24
Note: The command is used by Blaze'n'Blade, but the game doesn't have IRQ1 enabled, and the written value (1F801810h) looks more like an I/O address, rather than like a command, so not sure if it's done intentionally, or if it is just a bug.
GP0(03h) - Unknown?
Unknown. Doesn't seem to be used by any games. Unlike the "NOP" commands,
GP0(03h) does take up space in FIFO, so it is apparently not a NOP.
GP0(00h) - NOP (?)
This command doesn't take up space in the FIFO (eg. even if a VRAM-to-VRAM
transfer is still busy, one can send dozens of GP0(00h) commands, without the
command FIFO becoming full. So, either the command is ignored (or, if it has a
function, it is executed immediately, even while the transfer is busy).
...
GP0(00h) unknown, used with parameter = 08A16Ch... or rather 08FDBCh ... the
written value seems to be a bios/ram memory address, anded with 00FFFFFFh...
maybe a bios bug?
GP0(00h) seems to be often inserted between Texpage and Rectangle commands,
maybe it acts as a NOP, which may be required between that commands, for timing
reasons...?
GP0(04h..1Eh,E0h,E7h..EFh) - Mirrors of GP0(00h) - NOP (?)
Like GP0(00h), these commands don't take up space in the FIFO. So, maybe, they
are same as GP0(00h), however, the Drawing Area/Offset commands GP0(E3h..E5h)
don't take up FIFO space either, so not taking up FIFO space doesn't
neccessarily mean that the command has no function.
GPU Display Control Commands (GP1)
GP1 Display Control Commands are sent by writing the 8bit Command number
(MSBs), and 24bit parameter (LSBs) to Port 1F801814h. Unlike GP0 commands, GP1
commands are passed directly to the GPU (ie. they can be sent even when the
FIFO is full).
GP1(00h) - Reset GPU
0-23 Not used (zero)
GP1(01h) ;clear fifo
GP1(02h) ;ack irq (0)
GP1(03h) ;display off (1)
GP1(04h) ;dma off (0)
GP1(05h) ;display address (0)
GP1(06h) ;display x1,x2 (x1=200h, x2=200h+256*10)
GP1(07h) ;display y1,y2 (y1=010h, y2=010h+240)
GP1(08h) ;display mode 320x200 NTSC (0)
GP0(E1h..E6h) ;rendering attributes (0)
GP1(01h) - Reset Command Buffer
0-23 Not used (zero)
GP1(02h) - Acknowledge GPU Interrupt (IRQ1)
0-23 Not used (zero) ;GPUSTAT.24
GP1(03h) - Display Enable
0 Display On/Off (0=On, 1=Off) ;GPUSTAT.23
1-23 Not used (zero)
The "Off" settings displays a black picture (and still sends /SYNC signals to the television set). (Unknown if it still generates vblank IRQs though?)
GP1(04h) - DMA Direction / Data Request
0-1 DMA Direction (0=Off, 1=FIFO, 2=CPUtoGP0, 3=GPUREADtoCPU) ;GPUSTAT.29-30
2-23 Not used (zero)
Display start/end
Specifies where the display area is positioned on the screen, and how much data
gets sent to the screen. The screen sizes of the display area are valid only if
the horizontal/vertical start/end values are default. By changing these you can
get bigger/smaller display screens. On most TV's there is some black around the
edge, which can be utilised by setting the start of the screen earlier and the
end later. The size of the pixels is NOT changed with these settings, the GPU
simply sends more data to the screen. Some monitors/TVs have a smaller display
area and the extended size might not be visible on those sets. "(Mine is
capable of about 330 pixels horizontal, and 272 vertical in 320*240 mode)"
GP1(05h) - Start of Display area (in VRAM)
0-9 X (0-1023) (halfword address in VRAM) (relative to begin of VRAM)
10-18 Y (0-511) (scanline number in VRAM) (relative to begin of VRAM)
19-23 Not used (zero)
Unknown if using Y values in 512-1023 range is supported (with 2 MB VRAM).
GP1(06h) - Horizontal Display range (on Screen)
0-11 X1 (260h+0) ;12bit ;\counted in video clock units,
12-23 X2 (260h+320*8) ;12bit ;/relative to HSYNC
The number of displayed pixels per line is "(((X2-X1)/cycles_per_pix)+2) AND NOT 3" (ie. the hardware is rounding the width up/down to a multiple of 4 pixels).
Most games are using a width equal to the horizontal resolution (ie. 256, 320, 368, 512, 640 pixels). A few games are using slightly smaller widths (probably due to programming bugs). Pandemonium 2 is using a bigger "overscan" width (ensuring an intact picture without borders even on mis-calibrated TV sets).
The 260h value is the first visible pixel on normal TV Sets, this value is used by MOST NTSC games, and SOME PAL games (see below notes on Mis-Centered PAL games).
Video clock unit used depends on console region, regardless of NTSC/PAL video mode set by GP1(08h).3; see section on nominal video clocks for values.
GP1(07h) - Vertical Display range (on Screen)
0-9 Y1 (NTSC=88h-(240/2), (PAL=A3h-(288/2)) ;\scanline numbers on screen,
10-19 Y2 (NTSC=88h+(240/2), (PAL=A3h+(288/2)) ;/relative to VSYNC
20-23 Not used (zero)
The 88h/A3h values are the middle-scanlines on normal TV Sets, these values are used by MOST NTSC games, and SOME PAL games (see below notes on Mis-Centered PAL games).
The 240/288 values are for fullscreen pictures. Many NTSC games display 240 lines, but on most analog television sets, only 224 lines are visible (8 lines of overscan on top and 8 lines of overscan on bottom). Many PAL games display only 256 lines (underscan with black borders).
Some games such as Chrono Cross will occasionally adjust these values to create a screen shake effect, so proper emulation of this command is necessary for those particular cases.
GP1(08h) - Display mode
0-1 Horizontal Resolution 1 (0=256, 1=320, 2=512, 3=640) ;GPUSTAT.17-18
2 Vertical Resolution (0=240, 1=480, when Bit5=1) ;GPUSTAT.19
3 Video Mode (0=NTSC/60Hz, 1=PAL/50Hz) ;GPUSTAT.20
4 Display Area Color Depth (0=15bit, 1=24bit) ;GPUSTAT.21
5 Vertical Interlace (0=Off, 1=On) ;GPUSTAT.22
6 Horizontal Resolution 2 (0=256/320/512/640, 1=368) ;GPUSTAT.16
7 Flip screen horizontally (0=Off, 1=On, v1 only) ;GPUSTAT.14
8-23 Not used (zero)
Bit 7 is known as "reverseflag" and can reportedly be used on (v1?) arcade/prototype GPUs to flip the screen horizontally. On a v2 GPU setting this bit corrupts the display output, possibly due to leftovers of the v1 GPU's screen flipping circuitry still being present.
GP1(10h) - Read GPU internal register
GP1(11h..1Fh) - Mirrors of GP1(10h), Read GPU internal register
After sending the command, the result can be read (immediately) from GPUREAD
register (there's no NOP or other delay required) (namely GPUSTAT.Bit27 is used
only for VRAM reads, but NOT for register reads, so do not try to wait for that
flag).
0-23 Register index (via following GPUREAD)
00h-01h = Returns Nothing (old value in GPUREAD remains unchanged)
02h = Read Texture Window setting ;GP0(E2h) ;20bit/MSBs=Nothing
03h = Read Draw area top left ;GP0(E3h) ;19bit/MSBs=Nothing
04h = Read Draw area bottom right ;GP0(E4h) ;19bit/MSBs=Nothing
05h = Read Draw offset ;GP0(E5h) ;22bit
06h-07h = Returns Nothing (old value in GPUREAD remains unchanged)
08h-FFFFFFh = Mirrors of 00h..07h
00h-01h = Returns Nothing (old value in GPUREAD remains unchanged)
02h = Read Texture Window setting ;GP0(E2h) ;20bit/MSBs=Nothing
03h = Read Draw area top left ;GP0(E3h) ;20bit/MSBs=Nothing
04h = Read Draw area bottom right ;GP0(E4h) ;20bit/MSBs=Nothing
05h = Read Draw offset ;GP0(E5h) ;22bit
06h = Returns Nothing (old value in GPUREAD remains unchanged)
07h = Read GPU version (1 or 2)
08h = Unknown (Returns 00000000h) (lightgun? VRAM size set via GP1(09h)?)
09h-0Fh = Returns Nothing (old value in GPUREAD remains unchanged)
10h-FFFFFFh = Mirrors of 00h..0Fh
GP1(09h) - Set VRAM size (v2)
0 Allow Y coordinates in 512-1023 range (0=No/wrap to 0-511, 1=Yes)
1-23 Unknown (seems to have no effect)
GP1(09h) is supported only on v2 GPUs; v0 GPUs don't support 2 MB VRAM at all and v1 seems to use command GP1(20h) instead.
GP1(20h) - Set VRAM size (v1)
0-23 Unknown (501h=1 MB, 504h=2 MB, or so?)
GP1(0Bh) - Unknown/Internal?
0-10 Unknown (GPU crashes after a while when set to 274h..7FFh)
11-23 Unknown (seems to have no effect)
GP1(0Ah,0Ch..0Fh,21h..3Fh) - N/A
Not used?
GP1(40h..FFh) - N/A (Mirrors)
Mirrors of GP1(00h..3Fh).
Mis-Centered PAL Games (wrong GP1(06h)/GP1(07h) settings)
NTSC games are typically well centered (using X1=260h, and Y1/Y2=88h+/-N).
PAL games should be centered as X1=260h, and Y1/Y2=A3h+/-N) - these values
would be looking well on a Philips Philetta TV Set, and do also match up with
other common picture positions (eg. as used by Nintendo's SNES console).
However, most PAL games are using completely different "random" centering
values (maybe caused by different developers trying to match the centering to
the different TV Sets) (although it looks more as if the PAL developers just
went amok: Many PAL games are even using different centerings for their Intro,
Movie, and actual Game sequences).
In result, most PAL games are looking like crap when playing them on a real
PSX. For PSX emulators it may be recommended to ignore the GP1(06h)/GP1(07h)
centering, and instead, apply auto-centering to PAL games.
For PAL game developers, it may be recommended to add a screen centering option
(as found in Tomb Raider 3, for example). Unknown if this is really required...
or if X1=260h, and Y1/Y2=A3h+/-N would work fine on most or all PAL TV Sets?
GPU Status Register
1F801814h - GPUSTAT - GPU Status Register (R)
0-3 Texture page X Base (N*64) ;GP0(E1h).0-3
4 Texture page Y Base 1 (N*256) (ie. 0, 256, 512 or 768) ;GP0(E1h).4
5-6 Semi-transparency (0=B/2+F/2, 1=B+F, 2=B-F, 3=B+F/4) ;GP0(E1h).5-6
7-8 Texture page colors (0=4bit, 1=8bit, 2=15bit, 3=Reserved)GP0(E1h).7-8
9 Dither 24bit to 15bit (0=Off/strip LSBs, 1=Dither Enabled);GP0(E1h).9
10 Drawing to display area (0=Prohibited, 1=Allowed) ;GP0(E1h).10
11 Set Mask-bit when drawing pixels (0=No, 1=Yes/Mask) ;GP0(E6h).0
12 Draw Pixels (0=Always, 1=Not to Masked areas) ;GP0(E6h).1
13 Interlace Field (or, always 1 when GP1(08h).5=0)
14 Flip screen horizontally (0=Off, 1=On, v1 only) ;GP1(08h).7
15 Texture page Y Base 2 (N*512) (only for 2 MB VRAM) ;GP0(E1h).11
16 Horizontal Resolution 2 (0=256/320/512/640, 1=368) ;GP1(08h).6
17-18 Horizontal Resolution 1 (0=256, 1=320, 2=512, 3=640) ;GP1(08h).0-1
19 Vertical Resolution (0=240, 1=480, when Bit22=1) ;GP1(08h).2
20 Video Mode (0=NTSC/60Hz, 1=PAL/50Hz) ;GP1(08h).3
21 Display Area Color Depth (0=15bit, 1=24bit) ;GP1(08h).4
22 Vertical Interlace (0=Off, 1=On) ;GP1(08h).5
23 Display Enable (0=Enabled, 1=Disabled) ;GP1(03h).0
24 Interrupt Request (IRQ1) (0=Off, 1=IRQ) ;GP0(1Fh)/GP1(02h)
25 DMA / Data Request, meaning depends on GP1(04h) DMA Direction:
When GP1(04h)=0 ---> Always zero (0)
When GP1(04h)=1 ---> FIFO State (0=Full, 1=Not Full)
When GP1(04h)=2 ---> Same as GPUSTAT.28
When GP1(04h)=3 ---> Same as GPUSTAT.27
26 Ready to receive Cmd Word (0=No, 1=Ready) ;GP0(...) ;via GP0
27 Ready to send VRAM to CPU (0=No, 1=Ready) ;GP0(C0h) ;via GPUREAD
28 Ready to receive DMA Block (0=No, 1=Ready) ;GP0(...) ;via GP0
29-30 DMA Direction (0=Off, 1=?, 2=CPUtoGP0, 3=GPUREADtoCPU) ;GP1(04h).0-1
31 Drawing even/odd lines in interlace mode (0=Even or Vblank, 1=Odd)
Note
Further GPU status information can be retrieved via GP1(10h) and GP0(C0h).
Ready Bits
Bit28: Normally, this bit gets cleared when the command execution is busy (ie.
once when the command and all of its parameters are received), however, for
Polygon and Line Rendering commands, the bit gets cleared immediately after
receiving the command word (ie. before receiving the vertex parameters). The
bit is used as DMA request in DMA Mode 2, accordingly, the DMA would probably
hang if the Polygon/Line parameters are transferred in a separate DMA block
(ie. the DMA probably starts ONLY on command words).
Bit27: Gets set after sending GP0(C0h) and its parameters, and stays set until
all data words are received; used as DMA request in DMA Mode 3.
Bit26: Gets set when the GPU wants to receive a command. If the bit is cleared,
then the GPU does either want to receive data, or it is busy with a command
execution (and doesn't want to receive anything).
Bit25: This is the DMA Request bit, however, the bit is also useful for non-DMA
transfers, especially in the FIFO State mode.
GPU Versions
Summary of GPU Differences
Differences... v0 (160-pin) v1 (208-pin prototype) v2 (208-pin)
GPU Chip CXD8514Q CXD8538Q CXD8561Q/BQ/CQ/CXD9500Q
Mainboard EARLY-PU-8 and below Arcade boards only LATE-PU-8 and up
Memory Type Dual-ported VRAM Dual-ported VRAM? Normal DRAM
GPUSTAT.13 when interlace=off always 0 unknown always 1
GPUSTAT.14 always 0 screen flip nonfunctional screen flip
GPUSTAT.15 always 0 always 0? bit1 of texpage Y base
GP1(10h:index3..4) 19-bit (1 MB VRAM) 22-bit (2 MB VRAM) 20-bit (2 MB VRAM)
GP1(10h:index7) N/A 00000001h version 00000002h version
GP1(10h:index8) mirror of index0 00000000h zero 00000000h zero
GP1(10h:index9..F) mirror of index1..7 unknown N/A
GP1(09h) N/A N/A VRAM size
GP1(20h) N/A VRAM size/settings N/A
GP0(E1h).bit11 N/A N/A bit1 of texpage Y base
GP0(E1h).bit12/13 without x/y-flip without x/y-flip with x/y-flip
GP0(03h) N/A (no stored in fifo) unknown unknown/unused command
Shaded Textures ((color/8)*texel)/2 unknown (color*texel)/16
GP0(02h) FillVram xpos.bit0-3=0Fh=bugged unknown xpos.bit0-3=ignored
dma-to-vram: doesn't work with blksiz>10h (v2 gpu works with blksiz=8C0h!)
dma-to-vram: MAYBE also needs extra software-handshake to confirm DMA done?
320*224 pix = 11800h pix = 8C00h words
There is another arcade-only GPU revision, the CXD8654Q (v2b). It seems to use the same commands as regular v2 GPUs, but the differences between v2b and v2 are currently unknown.
Shaded Textures
The v0 GPU crops 8:8:8 bit gouraud shading color to 5:5:5 bit before multiplying
it with the texture color, resulting in rather poor graphics. For example, the
snow scence in the first level of Tomb Raider I looks a lot smoother on v2 GPUs.
This bug was presumably already fixed on the v1 prototype GPU (unconfirmed).
The cropped colors are looking a bit as if dithering would be disabled
(although, technically dithering works fine, but due to the crippled color
input, it's always using the same dither pattern per 8 intensities, instead of
using 8 different dither patterns).
Memory/Rendering Timings
The v0 GPU uses two Dual-ported VRAM chips (each with two 16bit databusses,
one for CPU/DMA/rendering access, and one for output to the video DAC). The New
GPU uses s normal DRAM chip (with single 32bit databus).
The exact timing differences are unknown, but the different memory types should
result in quite different timings:
The v0 GPU might perform better on non-32bit aligned accesses, and on memory
accesses performed simultaneously with DAC output.
On the other hand, the v2 GPU's DRAM seems to be faster in some cases (for
example, during Vblank, it's fast enough to perform DMA's with blksiz>10h,
which exceeds the GPU's FIFO size, and causes lost data on v0 GPUs).
X/Y-Flip and PSone 2 MB VRAM
The X/Y-flipping feature may be used by arcade games (provided that the arcade
board is fitted with v2 GPUs). The flipping feature does also work on retail
consoles with v2 GPUs, but PSX games should never use that feature (for
maintaining compatiblity with older PSX consoles).
Some PSone consoles seem to be fitted with 2 MB VRAM chips (maybe because
smaller chips had not been in production anymore), but only the first 1 MB
region is accessible. However, as all PSone models use a v2 GPU which supports
2 MB VRAM, it should be possible to rewire the chip selects to make the upper
half accessible.
GPU Detection (and optional VRAM size switching)
Below is slightly customized GPU Detection function taken from Perfect Assassin
(the index7 latching works ONLY on v1/v2 GPUs, whilst v0 GPUs would leave the
latched value unchanged; as a workaround, the index4 latching is used to ensure
that the latch won't contain 000002h on v0 GPUs, assuming that index4 is never
set to 000002h).
[1F801814h]=10000004h ;GP1(10h).index4 (latch draw area bottom right)
[1F801814h]=10000007h ;GP1(10h).index7 (latch GPU version, if any)
if ([1F801810h] AND 00FFFFFFh)=00000002h then goto @@gpu_v2
[1F801810h]=([1F801814h] AND 3FFFh) OR E1001000h ;change GPUSTAT via GP0(E1h)
dummy=[1F801810h] ;dummy read (unknown purpose)
if ([1F801814h] AND 00001000h) then goto @@gpu_v1 else goto @@gpu_v0
;---
@@gpu_v0:
return 0
;---
@@gpu_v1:
if want_2mb_vram then [1F801814h]=20000504h ;GP1(20h)
return 1
;---
@@gpu_v2:
if want_2mb_vram then [1F801814h]=09000001h ;GP1(09h)
return 2
GP0(02h) FillVram
The FillVram command does normally ignore the lower 4bit of the x-coordinate
(and software should always set those bits to zero). However, if the 4bits are
all set, then the old v0 GPU does write each 2nd pixel to wrong memory address.
For example, a 32x4 pixel fill produces following results for x=0..1Fh:
0h 10h 20h 30h 40h
| | | | |
################################ ;\x=00h..0Eh
################################ ; and, x=0Fh
################################ ; on v2 GPU
################################ ;/
# # # # # # # ################## # # # # # # # ;\
# # # # # # # ################## # # # # # # # ; x=0Fh
# # # # # # # ################## # # # # # # # ; on v0 GPU
# # # # # # # ################## # # # # # # # ;/
################################ ;\x=10h..1Eh
################################ ; and, x=1Fh
################################ ; on v2 GPU
################################ ;/
# # # # # # # ################## # # # # # # # ;\
# # # # # # # ################## # # # # # # # ; x=1Fh
# # # # # # # ################## # # # # # # # ; on v0 GPU
# # # # # # # ################## # # # # # # # ;/
GPU Depth Ordering
Absent Depth Buffer
The PlayStation's GPU stores only RGB colors in the framebuffer (ie. unlike
modern 3D processors, it's NOT buffering Depth values; leaving apart the Mask
bit, which could be considered as a tiny 1bit "Depth" or "Priority" value). In
fact, the GPU supports only X,Y coordinates, and it's totally unaware of Z
coordinates. So, when rendering a polygon, the hardware CANNOT determine which
of the new pixels are in front/behind of the old pixels in the buffer.
Simple Ordering
The rendering simply takes place in the ordering as the data is sent to the GPU
(ie. the most distant objects should be sent first). For 2D graphics, it's
fairly easy follow that order (eg. even multi-layer 2D graphics can be using
DMA2-continous mode).
Depth Ordering Table (OT)
For 3D graphics, the ordering of the polygons may change more or less randomly
(eg. when rotating/moving the camera). To solve that problem, the whole
rendering data is usually first stored in a Depth Ordering Table (OT) in Main
RAM, and, once when all polygons have been stored in the OT, the OT is sent to
the GPU via "DMA2-linked-list" mode.
Initializing an empty OT (via DMA6)
DMA channel 6 can be used to set up an empty linked list, in which each entry
points to the previous:
DPCR - enable bits ;Example=x8xxxxxxh
D6_MADR - pointer to the LAST table entry ;Example=8012300Ch
D6_BCR - number of list entries ;Example=00000004h
D6_CHCR - control bits (should be 11000002h) ;Example=11000002h
[80123000h]=00FFFFFFh ;1st entry, points to end code (xxFFFFFFh)
[80123004h]=00123000h ;2nd entry, points to 1st entry
[80123008h]=00123004h ;3rd entry, points to 2nd entry
[8012300Ch]=00123008h ;last entry, points to 3rd entry (table entrypoint)
Inserting Entries (Passing GTE data to the OT) (by software)
The GTE commands AVSZ3 and AVSZ4 can be used to calculate the Average Z
coordinates of a polygon (based on its three or four Z coordinates). The result
is returned as a 16bit Z value in GTE register OTZ, the commands do also allow
to divide the result, to make it less than 16bit (the full 16bit would require
an OT of 256KBytes - for the EMPTY table, which would be a waste of memory, and
which would slowdown the DMA2/DMA6 operations) (on the other hand, a smaller
table means less depth resolution).
[PacketAddr+0] = [80123000h+OTZ*4] + (N SHL 24) <--internal link chain
[PacketAddr+4..N*4] = GP0 Command(s) and Parameters <--data (send to GP0)
[80123000h+OTZ*4] = PacketAddr AND FFFFFFh <--internal link chain
Not sure if the packet size must be limited to max N=16 words (ie. as for the DMA2-continous block size) (due to GP0 FIFO size limits)?
Sending the OT to the GPU (via DMA2-linked-list mode)
1 - Wait until GPU is ready to receive commands ;GPUSTAT.28
2 - Enable DMA channel 2 ;DPCR
3 - Set GPU to DMA cpu->gpu mode ;[GP1]=04000002h aka GP1(04h)
3 - Set D2_MADR to the start of the list ;(LAST Entry) ;Example=80123010h
4 - Set D2_BCR to zero ;(length unused, end at END-CODE)
5 - Set D2_CHCR to link mode, mem->GPU and dma enable ;=01000401h
GPU Video Memory (VRAM)
Framebuffer
The framebuffer contains the image that is to be output to the Television Set.
The GPU supports 10 resolutions, with 16bit or 24bit per pixel.
Resolution 16bit 24bit | Resolution 16bit 24bit
256x240 120Kbytes 180Kbytes | 256x480 240Kbytes 360Kbytes
320x240 150Kbytes 225Kbytes | 320x480 300Kbytes 450Kbytes
368x240 xx0Kbytes xx0Kbytes | 368x480 xx0Kbytes xx0Kbytes
512x240 240Kbytes 360Kbytes | 512x480 480Kbytes 720Kbytes
640x240 300Kbytes 450Kbytes | 640x480 600Kbytes 900Kbytes
<B> 15bit Direct Display (default) (works with polygons, lines, rectangles)</B>
0-4 Red (0..31)
5-9 Green (0..31)
10-14 Blue (0..31)
15 Mask flag (0=Normal, 1=Do not allow to overwrite this pixel)
<B> 24bit Direct Display (works ONLY with direct vram transfers)</B>
0-7 Red (0..255)
8-15 Green (0..255)
16-23 Blue (0..255)
Texture Bitmaps
A texture is an image put on a polygon or sprite. The data of a texture can be
stored in 3 different modes:
<B> 16bit Texture (Direct Color) ;(One 256x256 page = 128Kbytes)</B>
0-4 Red (0..31) ;\Color 0000h = Fully-transparent
5-9 Green (0..31) ; Color 0001h..7FFFh = Non-transparent
10-14 Blue (0..31) ; Color 8000h..FFFFh = Semi-transparent (*)
15 Semi-transparency Flag ;/(*) or Non-transparent for opaque commands
<B> 8bit Texture (256 Color Palette) ;(One 256x256 page = 64Kbytes)</B>
0-7 Palette index for 1st pixel (left)
8-15 Palette index for 2nd pixel (right)
<B> 4bit Texture (16 Color Palette) ;(One 256x256 page = 32Kbytes)</B>
0-3 Palette index for 1st pixel (left)
4-7 Palette index for 2nd pixel (middle/left)
8-11 Palette index for 3rd pixel (middle/right)
12-15 Palette index for 4th pixel (right)
The GP0(E2h) Texture Window (aka Texture Repeat) command can be used to reduce the texture size to less than 256x256 texels.
The Texture Pages can be located in the frame buffer on X multiples of 64 halfwords and Y multiples of 256 lines.
Texture Palettes - CLUT (Color Lookup Table)
The clut is a the table where the colors are stored for the image data in the
CLUT modes. The pixels of those images are used as indexes to this table. The
clut is arranged in the frame buffer as a 256x1 image for the 8bit clut mode,
and a 16x1 image for the 4bit clut mode.
0-4 Red (0..31) ;\Color 0000h = Fully-transparent
5-9 Green (0..31) ; Color 0001h..7FFFh = Non-transparent
10-14 Blue (0..31) ; Color 8000h..FFFFh = Semi-transparent (*)
15 Semi-transparency Flag ;/(*) or Non-transparent for opaque commands
Texture Color Black Limitations
On the PSX, texture color 0000h is fully-transparent, that means textures
cannot contain Black pixels. However, in some cases, Color 8000h (Black with
semi-transparent flag) can be used, depending on the rendering command:
opaque command, eg. GP0(24h) --> 8000h = Non-Transparent Black
semi-transp command, eg. GP0(26h) --> 8000h = Semi-Transparent Black
GPU Texture Caching
The GPU has 2 Kbyte Texture Cache
There is also a CLUT cache that is preserved between GPU drawing commands. The
CLUT cache is invalidated when different CLUT index values are used or when
GP0(01h) is issued.
If polygons with texture are displayed, the GPU needs to read these from the
frame buffer. This slows down the drawing process, and as a result the number
of polygons that can be drawn in a given timespan. To speed up this process the
GPU is equipped with a texture cache, so a given piece of texture needs not to
be read multiple times in succession.
The texture cache size depends on the color mode used for the textures.
In 4 bit CLUT mode it has a size of 64x64, in 8 bit CLUT it's 32x64 and in
15bitDirect is 32x32. A general speed up can be achieved by setting up textures
according to these sizes. For further speed gain a more precise knowledge of
how the cache works is necessary.
Cache blocks
The texture page is divided into non-overlapping cache blocks, each of a unit
size according to color mode. These cache blocks are tiled within the texture
page.
+-----+-----+-----+--
|cache| | |
|block| |
| 0| 1 | 2 ..
+-----+-----+--
|.. | |
Cache entries
Each cache block is divided into 256 cache entries, which are numbered
sequentially, and are 8 bytes wide. So a cache entry holds 16 4bit clut pixels
8 8bit clut pixels, or 4 15bitdirect pixels.
4bit and 8bit clut: 15bitdirect:
+----+----+----+----+ +----+----+----+----+----+----+----+----+
| 0| 1| 2| 3| | 0| 1| 2| 3| 4| 5| 6| 7|
+----+----+----+----+ +----+----+----+----+----+----+----+----+
| 4| 5| 6| 7| | 8| 9| a| b| c| d| e| f|
+----+----+----+----+ +----+----+----+----+----+----+----+----+
| 8| 9| .. | 10| 11| ..
+----+----+-- +----+----+--
| c| ..| | 18| ..|
+----+-- +----+--
| .. | ..
GPU Timings
Nominal Video Clock
NTSC video clock = 53.693175 MHz
PAL video clock = 53.203425 MHz
Vertical Video Timings
263 scanlines per field for NTSC non-interlaced
262.5 scanlines per field for NTSC interlaced
314 scanlines per field for PAL non-interlaced
312.5 scanlines per field for PAL interlaced
Vertical Refresh Rates
NTSC mode on NTSC video clock
Interlaced: 59.940 Hz
Non-interlaced: 59.826 Hz
PAL mode on PAL video clock
Interlaced: 50.000 Hz
Non-interlaced: 49.761 Hz
NTSC mode on PAL video clock
Interlaced: 59.393 Hz
Non-interlaced: 59.280 Hz
PAL mode on NTSC video clock
Interlaced: 50.460 Hz
Non-interlaced: 50.219 Hz
TODO: Derivations for vertical refresh rates; horizontal timing notes
Nocash's original GPU Timings notes:
Video Clock
The PSone/PAL video clock is the cpu clock multiplied by 11/7.
CPU Clock = 33.868800MHz (44100Hz*300h)
Video Clock = 53.222400MHz (44100Hz*300h*11/7)
Pinouts - CLK Pinouts
Vertical Timings
PAL: 314 scanlines per frame (13Ah)
NTSC: 263 scanlines per frame (107h)
Horizontal Timings
PAL: 3406 video cycles per scanline (or 3406.1 or so?)
NTSC: 3413 video cycles per scanline (or 3413.6 or so?)
PSX.256-pix Dotclock = 5.322240MHz (44100Hz*300h*11/7/10)
PSX.320-pix Dotclock = 6.652800MHz (44100Hz*300h*11/7/8)
PSX.368-pix Dotclock = 7.603200MHz (44100Hz*300h*11/7/7)
PSX.512-pix Dotclock = 10.644480MHz (44100Hz*300h*11/7/5)
PSX.640-pix Dotclock = 13.305600MHz (44100Hz*300h*11/7/4)
Namco GunCon 385-pix = 8.000000MHz (from 8.00MHz on lightgun PCB)
320pix/PAL: 3406/8 = 425.75 dots 320pix/NTSC: 3413/8 = 426.625 dots
640pix/PAL: 3406/4 = 851.5 dots 640pix/NTSC: 3413/4 = 853.25 dots
256pix/PAL: 3406/10 = 340.6 dots 256pix/NTSC: 3413/10 = 341.3 dots
512pix/PAL: 3406/5 = 681.2 dots 512pix/NTSC: 3413/5 = 682.6 dots
368pix/PAL: 3406/7 = 486.5714 dots 368pix/NTSC: 3413/7 = 487.5714 dots
Frame Rates
PAL: 53.222400MHz/314/3406 = ca. 49.76 Hz (ie. almost 50Hz)
NTSC: 53.222400MHz/263/3413 = ca. 59.29 Hz (ie. almost 60Hz)
Note
Above values include "hidden" dots and scanlines (during horizontal and
vertical blanking/retrace).
GPU (MISC)
GP0(20h..7Fh) - Render Command Bits
0-23 Color for (first) Vertex (Not for Raw-Texture)
24 Texture Mode (0=Blended, 1=Raw) (Textured-Polygon/Rect only)
25 Semi-transparency (0=Off, 1=On) (All Render Types)
26 Texture Mapping (0=Off, 1=On) (Polygon/Rectangle only)
27-28 Rect Size (0=Var, 1=1x1, 2=8x8, 3=16x16) (Rectangle only)
27 Num Vertices (0=Triple, 1=Quad) (Polygon only)
27 Num Lines (0=Single, 1=Poly) (Line only)
28 Shading (0=Flat, 1=Gouroud) (Polygon/Line only)
29-31 Primitive Type (1=Polygon, 2=Line, 3=Rectangle)
Perspective (in-)correct Rendering
The PSX doesn't support perspective correct rendering: Assume a polygon to be
rotated so that it's right half becomes more distant to the camera, and it's
left half becomes closer. Due to the GTE's perspective division, the right half
should appear smaller than the left half.
The GPU supports only linear interpolations for rendering - that is correct
concerning the X and Y screen coordinates (which are still linear to each
other, even after perspective division, since both are divided by the same
value).
However, texture coordinates (and Gouraud shaded colors) are NOT linear to the
screen coordinates, and so, the linear interpolated PSX graphics are often
looking rather distorted, that especially for textures that contain straight
lines. For color shading the problem is less obvious (since shading is kinda
blurry anyways).
Perspective correct Rendering
For perspective correct rendering, the polygon's Z-coordinates would be needed
to be passed from the GTE to the GPU, and, the GPU would then need to use that
Z-coordinates to "undo" the perspective division for each pixel (that'd require
some additional memory, and especially a powerful division unit, which isn't
implemented in the hardware).
As a workaround, you can try to reduce the size of your polygons (the
interpolation errors increase in the center region of larger polygons).
Reducing the size would be only required for polygons that occupy a larger
screen region (which may vary depending on the distance to the camera).
Ie. you may check the size AFTER perspective division, if it's too large, then
break it into smaller parts (using the original coordinates, NOT the screen
coordinates), and then pass the fragments to the GTE another time.
Again, perspective correction would be relevant only for certain textures (not
for randomly dithered textures like sand, water, fire, grass, and not for
untextured polygons, and of course not for 2D graphics, so you may exclude
those from size reduction).
24bit RGB to 15bit RGB Dithering (enabled in Texpage attribute)
For dithering, VRAM is broken to 4x4 pixel blocks, depending on the location in
that 4x4 pixel region, the corresponding dither offset is added to the 8bit
R/G/B values, the result is saturated to +00h..+FFh, and then divided by 8,
resulting in the final 5bit R/G/B values.
-4 +0 -3 +1 ;\dither offsets for first two scanlines
+2 -2 +3 -1 ;/
-3 +1 -4 +0 ;\dither offsets for next two scanlines
+3 -1 +2 -2 ;/(same as above, but shifted two pixels horizontally)
LINEs are dithered (no matter if they are mono or do use gouraud shading).
RECTs are NOT dithered (no matter if they do use modulation or not).
Shading
The GPU has a shading function, which will scale the color of a primitive to a
specified brightness. There are 2 shading modes: Flat shading, and gouraud
shading. Flat shading is the mode in which one brightness value is specified
for the entire primitive. In Gouraud shading mode, a different brightness value
can be given for each vertex of a primitive, and the brightness between these
points is automatically interpolated.
Semi-transparency
When semi-transparency is set for a pixel, the GPU first reads the pixel it
wants to write to, and then calculates the color it will write from the 2
pixels according to the semi-transparency mode selected. Processing speed is
lower in this mode because additional reading and calculating are necessary.
There are 4 semi-transparency modes in the GPU.
B=Back (the old pixel read from the frame buffer)
F=Front (the new semi-transparent pixel)
* 0.5 x B + 0.5 x F ;aka B/2+F/2
* 1.0 x B + 1.0 x F ;aka B+F
* 1.0 x B - 1.0 x F ;aka B-F
* 1.0 x B +0.25 x F ;aka B+F/4
When using additive blending, if a channel's intensity is greater than 255, it gets clamped to 255 rather than being masked. Similarly, if using subtractive blending and a channel's intensity ends up being < 0, it's clamped to 0.
Modulation (also known as Texture Blending)
Modulation is a colour effect that can be applied to textured primitives.
For each pixel of the primitive it combines every colour channel of the fetched
texel with the corresponding channel of the interpolated vertex colour according
to this formula (Assuming all channels are 8-bit).
finalChannel.rgb = (texel.rgb * vertexColour.rgb) / vec3(128.0)
Using a vertex colour of 0x808080 (ie all channels set to 128) is equivalent to not applying modulation to the primitive, as shown by the above formula.
"Texture blending" is not meant to be confused with normal blending, ie an operation that merges the backbuffer colour with the incoming pixel and draws the resulting colour to the backbuffer. The PS1 has this capability to an extent, using semi-transparency.
Draw to display enable
This will enable/disable any drawing to the area that is currently displayed.
Not sure yet WHY one should want to disable that?
Also not sure HOW and IF it works... the SIZE of the display area is implied by
the screen size - which is horizontally counted in CLOCK CYCLES, so, to obtain
the size in PIXELS, the hardware would require to divide that value by the
number of cycles per pixel, depending on the current resolution...?