Partial-Word Writes to MMIO
Overview
Methodology
On-die MMIO behavior
SBUS behavior
Summary table
Practical implications
Caveats
Overview
The MIPS R3000A asserts byte-enable signals on sb/sh/swl/swr
stores, but the PS1's bus fabric does not honor them as a programmer
would expect on RAM. The result is that partial-word writes to
hardware registers behave very differently from partial-word writes
to RAM, and the exact behavior depends on which bus the target
register sits on:
- On-die MMIO (IRQ control at
0x1F801070-77h, DMA control registers at0x1F8010F0-F7h, GPU control, MDEC, etc.) sits on the 32-bit data fabric internal to the CPU package. Byte enables are ignored; partial-word stores latch a shifted source word in full. - SBUS (SPU, CD-ROM, BIOS ROM, Expansion 1/2) is an off-die
system bus whose data path width is per-device, configured via
bit 12 of each device's Delay/Size register (see
Memory Control). CD-ROM and BIOS ROM are
configured as 8-bit; SPU is configured as 16-bit; expansion
ports can use either. For a 32-bit CPU access, the bus interface
unit decomposes the access into either 2 (16-bit device) or 4
(8-bit device) consecutive transactions, keeping
!CSactive and stepping the address. The SPU specifically latches halfwords wheneverSBUS./WR0strobes;SBUS./WR1alone is ignored.
The practical upshot is that idioms like "write one byte of a
register" do not work for hardware registers, and an sw to a
16-bit SPU register also writes the neighboring register.
All numbers below are hardware-verified on a single SCPH-5501. The SBUS rules below were measured against SPU registers only; behavior at CD-ROM and BIOS ROM (8-bit-configured) is expected to differ in the number of transactions per access and has not been measured here.
Methodology
For each (target register, op, byte offset, baseline pattern, source pattern) the test:
set_baseline(target, baseline) ; full-width store
$a2 = source ; distinguishable bytes
<op> $a2, off(target_addr) ; verbatim asm trampoline
result = readback(target) ; full-width load
Three baselines (0x00000000, 0xFFFFFFFF, 0x11223344) and
two source patterns (0xAABBCCDD, 0xFEDCBA98) per (target,
op, offset) triple. The asm trampolines force the exact opcode
the C compiler would otherwise legalize. Misaligned sh traps
are caught by an installed AdES handler that records the trap
and skips the faulting instruction.
Targets:
IMASK(0x1F801074): on-die, 32-bit, low 11 bits valid.DPCR(0x1F8010F0): on-die, 32-bit, all bits R/W. The cleanest readback target.SPU_VOL_MAIN_LEFT(0x1F801D80): SBUS, 16-bit, R/W after muting the SPU. Neighbor isSPU_VOL_MAIN_RIGHTat+2.- Voice 0 left volume (
0x1F801C00): SBUS, 16-bit. Confirms the behavior is a property of the bus path, not of any individual SPU register.
On-die MMIO behavior
For 32-bit on-die registers the CPU drives a shifted source word on the 32-bit data bus and the decoder latches the entire word, regardless of which byte enables are asserted.
The shift is determined by the op and the byte offset:
| op | bus value driven on D[31:0] |
|---|---|
sw |
src |
sb +N |
src << (N * 8) |
sh +0 |
src |
sh +2 |
src << 16 |
swl +N |
src >> ((3 - N) * 8) |
swr +N |
src << (N * 8) |
Example with src = 0xAABBCCDD against DPCR:
| op | offset | got after readback |
|---|---|---|
sw |
+0 | 0xAABBCCDD |
sb |
+0 | 0xAABBCCDD |
sb |
+1 | 0xBBCCDD00 |
sb |
+2 | 0xCCDD0000 |
sb |
+3 | 0xDD000000 |
sh |
+0 | 0xAABBCCDD |
sh |
+2 | 0xCCDD0000 |
swl |
+0 | 0x000000AA |
swl |
+1 | 0x0000AABB |
swl |
+2 | 0x00AABBCC |
swl |
+3 | 0xAABBCCDD |
swr |
+0 | 0xAABBCCDD |
swr |
+1 | 0xBBCCDD00 |
swr |
+2 | 0xCCDD0000 |
swr |
+3 | 0xDD000000 |
In every case the register is fully overwritten with the bus
value. The pre-write contents do not contribute - results are
identical with baselines 0x00000000, 0xFFFFFFFF, and
0x11223344. Byte enables do not gate the latch.
IMASK follows the same pattern in its 11 valid bits. The
upper 21 bits read as 0xBF800xxx, the upper half of the
IMASK address itself echoing on open-bus lanes during the
read phase.
Misaligned sh (+1 and +3) traps with cause=0x10000014
(ExcCode 5, AdES). The exception fires before the bus
transaction; the register is unchanged.
SBUS behavior
The SBUS is the off-die system bus shared by SPU
(SBUS./CS4), CD-ROM (/CS5), BIOS ROM (/CS2), and
Expansion 1/2 (/CS0, /CS3). Address lines are
SBUS.A[23:0]. Data lines are SBUS.D[15:0], but each
device is independently configured as 8-bit or 16-bit
via bit 12 of its Delay/Size register (see
Memory Control): an 8-bit device uses
only SBUS.D[7:0]. Write strobes are SBUS./WR0 (lower
byte of halfword) and SBUS./WR1 (upper byte).
SBUS./WR1 is wired only to the expansion port and not
to other SBUS devices; the SPU and the 8-bit devices do
not consume it.
Because the SBUS data path is at most 16 bits wide, a
32-bit CPU access cannot fit in one bus cycle. The bus
interface unit (BIU) decomposes each CPU access into a
sequence of consecutive transactions: 2 transactions for
a 32-bit access on a 16-bit device, 4 transactions for
the same on an 8-bit device, with !CS held active and
the address stepped between transactions.
The rules in this section were measured against SPU
registers (16-bit configured). The 16-bit-bus dispatch
rules apply unchanged for any 16-bit-configured SBUS
device that latches on /WR0. The 8-bit case (CD-ROM,
BIOS ROM) was not measured here and would be a separate
matrix.
BIU dispatch rules
Number of transactions:
- 4-byte CPU access (
sw,swl +3,swr +0): TWO transactions, one per halfword. - 1-, 2-, or 3-byte access (
sb,sh, all otherswl/swr): ONE transaction, at the halfword containing the lowest enabled byte. Bytes outside that halfword are silently dropped.
Strobe assertion per issued halfword:
- 1-byte access (
sb,swl +0,swr +3): asserts only/WR0or/WR1per the addressed lane (lane 0 or lane 2 ->/WR0; lane 1 or lane 3 ->/WR1). - 2-or-more-byte access: asserts BOTH
/WR0and/WR1on each issued halfword, regardless of which CPU-side lanes were enabled within it.
SPU latch rule: the SPU latches the full 16 bits of SBUS.D
into the addressed register whenever /WR0 strobes. /WR1
strobing alone does nothing.
SBUS writes to MAINVOL_L
Test target: SPU_VOL_MAIN_LEFT at 0x1F801D80. Neighbor:
SPU_VOL_MAIN_RIGHT at 0x1F801D82. Source 0xAABBCCDD,
baseline 0x0000 for both registers.
| op | bytes | issued halfword | strobe | bus halfword | reg got | neighbor got |
|---|---|---|---|---|---|---|
sh +0 |
2 | low (MAINVOL_L) | both | 0xCCDD |
0xCCDD |
0x0000 |
sb +0 |
1 | low | /WR0 only |
0xCCDD |
0xCCDD |
0x0000 |
sb +1 |
1 | low | /WR1 only |
0xDD00 |
0x0000 |
0x0000 |
sb +2 |
1 | high (MAINVOL_R) | /WR0 only |
0xCCDD |
0x0000 |
0xCCDD |
sb +3 |
1 | high | /WR1 only |
0xCC00 |
0x0000 |
0x0000 |
swl +0 |
1 | low | /WR0 only |
0x00AA |
0x00AA |
0x0000 |
swl +1 |
2 | low | both | 0xAABB |
0xAABB |
0x0000 |
swl +2 |
3 | low | both | 0xBBCC |
0xBBCC |
0x0000 |
swl +3 |
4 | both | both x2 | both | 0xCCDD |
0xAABB |
swr +0 |
4 | both | both x2 | both | 0xCCDD |
0xAABB |
swr +1 |
3 | low | both | 0xDD00 |
0xDD00 |
0x0000 |
swr +2 |
2 | high | both | 0xCCDD |
0x0000 |
0xCCDD |
swr +3 |
1 | high | /WR1 only |
0xDD00 |
0x0000 |
0x0000 |
Reading the table:
sb +1andsb +3are silent no-ops because their single transaction asserts only/WR1, which the SPU does not consume. The register is unchanged regardless of baseline.sb +2lands in MAINVOL_R, not MAINVOL_L. A "byte 2" address decodes to the high halfword on SBUS, so the transaction targets MAINVOL_R.swl +2writes 3 bytes (bytes 0+1+2). The lowest enabled byte is byte 0, so the transaction issues at the low halfword. The byte 2 portion that would have spanned into MAINVOL_R is silently dropped.swr +1symmetrically writes 3 bytes (bytes 1+2+3) and issues at the low halfword. The byte 2+3 portion that would have spanned into MAINVOL_R is dropped.swr +0(full 32-bit access) splits into two transactions and hits both registers. This is why answto a 16-bit SPU register also clobbers its neighbor.
The behavior is identical when the target is voice 0 left volume
at 0x1F801C00. It is a property of the SBUS path, not of the
specific register.
When /WR0 strobes, the entire 16-bit SBUS data halfword lands
in the addressed register, even on sb. With baseline 0xFFFF
and source 0xAABBCCDD, an sb +0 produces got 0xCCDD, not
0xFFDD - the high byte of the register is overwritten by the
high byte of the bus data, even though the byte-enable was only
on lane 0. The SPU does not byte-mask within a halfword.
Summary table
Per-op behavior at every byte offset, for the two bus types. "latch full word/halfword" means the register is overwritten with the full bus payload regardless of byte enables. "drop" means the register is unchanged.
| op | offset | on-die 32-bit MMIO | SBUS 16-bit-configured (e.g. SPU) |
|---|---|---|---|
sw |
+0 | full word latched | both halfwords latched (incl. neighbor) |
sh |
+0 | full word, hi half from bus | low halfword latched |
sh |
+2 | full word, lo half from bus | high halfword latched (in neighbor) |
sh |
+1/+3 | AdES | AdES |
sb |
+0 | full word latched | low halfword latched |
sb |
+1 | full word latched | drop |
sb |
+2 | full word latched | high halfword latched (in neighbor) |
sb |
+3 | full word latched | drop |
swl |
+0 | full word latched | low halfword latched |
swl |
+1 | full word latched | low halfword latched |
swl |
+2 | full word latched | low halfword latched (byte 2 dropped) |
swl |
+3 | full word latched | both halfwords latched |
swr |
+0 | full word latched | both halfwords latched |
swr |
+1 | full word latched | low halfword latched (bytes 2+3 dropped) |
swr |
+2 | full word latched | high halfword latched (in neighbor) |
swr |
+3 | full word latched | drop |
Practical implications
- Setting one byte of a 32-bit MMIO register via
sbdoes not work. The whole register is overwritten. - Read-modify-write idioms like
*reg |= maskwork as expected whenregis typed asvolatile uint32_t *for an on-die 32-bit register, orvolatile uint16_t *for an SPU register: the compiler emits a matching-width load and store, and the partial-word rules above never come into play. The hazard is aliasing an MMIO register through a different-width pointer (e.g., avolatile uint8_t *to a 32-bit register) - the load is harmless but the store becomes ansband clobbers the entire register per the rules above. swto a 16-bit SPU register also writes the neighboring register. Useshfor single-register writes.sbto the high byte of a halfword in any SBUS region (lane 1 or lane 3 within the addressed word) is a silent no-op on the SPU. Other SBUS devices (CD-ROM, BIOS ROM) may behave differently; only the SPU was tested directly.swlandswrto MMIO are well-defined but not particularly useful for register access: they latch a shifted source identical tosb/shpatterns, and on SBUS they drop bytes that span halfword boundaries.- Misaligned
sh(+1/+3) AdES-traps cleanly without modifying the target register.
Caveats
- Hardware-verified on a single retail SCPH-5501. Other PSX revisions have not been measured. The on-die MMIO behavior is expected to be identical across revisions; the SBUS dispatch rules are properties of the bus interface unit on the CPU die and should also be revision-stable, but the per-strobe SPU latch behavior may differ on SPU revisions shipped in other consoles.
- Only IMASK, DPCR, and SPU registers (MAINVOL_L and voice 0 volumes) were tested directly. The on-die behavior is expected to apply uniformly to all on-die MMIO; the SBUS behavior is expected to apply to other 16-bit-configured SBUS devices but has not been verified there. CD-ROM and BIOS ROM are 8-bit-configured (Delay/Size bit 12 = 0) so the number of transactions per CPU access differs - their write semantics were not measured.
- GPU GP0/GP1 (VBUS, 32-bit, single decoded address bit, no byte masking) and timer mode registers (R/W with side effects) were not tested.