Chapter 3 - Assembly and Z80 Emission
Assembly turns source items into facts, bytes, fixups and source segments. This chapter combines the assembler-time fact model with the Z80 emission path because those two stages are tightly coupled: planning decides addresses and sizes, then emission writes bytes using those facts.
The central files are:
src/assembly/address-planning.tssrc/assembly/placement.tssrc/assembly/program-emission.tssrc/assembly/fixup-emission.tssrc/assembly/assemble-program.tssrc/semantics/expression-evaluation.tssrc/z80/parse-instruction.tssrc/z80/instruction.tssrc/z80/encode.tssrc/z80/effects.ts
Assembly Orchestration
src/assembly/assemble-program.ts is the entry point:
export function assembleProgram(items: readonly SourceItem[]): AssembleProgramResult
It builds the address state, emits the program image and returns diagnostics, symbols, byte maps and source segments. The function is intentionally small. Detailed decisions live in the modules below it.
Address Planning
src/assembly/address-planning.ts walks source items and builds the facts
needed to place and evaluate the program:
- labels and their addresses
.equconstants and enum members- record and union layouts
- type aliases
.orgplacement state- sizes for data, storage and instructions
The returned AddressState contains symbol facts, equate records, layout
records, enum names, diagnostics and placement information. This is where parsed
syntax becomes assembler knowledge. A label item becomes an address. A type
declaration becomes a layout record. A .ds directive becomes a byte count.
Placement
src/assembly/placement.ts owns placement state. .org sets the active
address. Instructions advance by encoded size. .db advances by emitted byte
count. .dw advances by two bytes per expression. .ds advances by calculated
storage size. .align advances to the next aligned address.
Placement is separate from byte emission so the assembler can resolve symbols before every byte has been written. A branch to a later label can be encoded after address planning has seen the label definition.
Symbols and Layouts
Labels define addresses. .equ declarations define assembler-time values.
Enums define qualified constants. defineLabel(), defineEquate() and
defineEnumMembers() enforce duplicate rules and record spans for diagnostics
and output metadata.
Record and union declarations become layout records. A record field advances the offset by its byte size. A union field starts at offset zero and the union size is the largest field size.
For a record:
Sprite .type
x .field byte
y .field byte
tile .field byte
flags .field byte
.endtype
the layout record stores x = 0, y = 1, tile = 2, flags = 3 and total
size 4.
Type aliases bind a name to another type expression:
SpriteArray .typealias Sprite[16]
sizeof(SpriteArray) resolves through the alias to the size of Sprite[16],
with the same field paths as the underlying array expression.
Expression Evaluation
src/semantics/expression-evaluation.ts evaluates expression trees against the
assembler-time environment. It owns the rules for literal arithmetic, symbol
lookup, sizeof(...), offset(...), LSB(...), MSB(...) and layout casts
that fold to constant addresses.
Expression evaluation is context-sensitive. A symbol in an instruction operand
may be a label or constant. A type expression in .ds Sprite[4] resolves to a
byte count. A layout cast such as <SpriteArray>Sprites[3].flags resolves to an
address when the base address, type alias, index and field path are all known at
assembly time.
Data and Storage Size
Address planning needs the byte length of each directive. It contains size
helpers for .db, .dw, .ds, strings and alignment. The same string
directive byte rules are reused during emission so planning and output stay
aligned.
Initialized data writes bytes through .db, .dw, .cstr, .pstr and
.istr. .ds reserves space calculated from numbers or layout type
expressions. Layout type expressions are byte-size expressions in storage and
field-size positions.
Program Emission
src/assembly/program-emission.ts owns byte writing. It walks source items in
order and writes emitted bytes to a map keyed by absolute address. It also
records source segments for the D8 map writer.
The emitted program shape is:
export interface EmittedProgram {
readonly image: ReadonlyMap<number, number>;
readonly sourceSegments: readonly EmittedSourceSegment[];
readonly initializedAddresses: readonly number[];
}
The byte map stores final addresses. If source emits one byte at $0100 and
another at $8000, the map has two entries. A sparse map fits Z80 programs that
place code, data and ROM vectors at separate origins.
Emission Walkthrough
For this source:
.org $0100
Start:
ld a,42
jp Start
address planning records Start = $0100. The instruction parser represents
ld a,42 as an immediate load and jp Start as an absolute branch. The encoder
returns literal opcode bytes plus a 16-bit expression fragment for Start.
Fixup emission evaluates Start to $0100 and writes the little-endian operand
bytes.
The final byte map contains:
$0100: 3E
$0101: 2A
$0102: C3
$0103: 00
$0104: 01
The D8 source segment records that those bytes came from the source lines that emitted them.
Z80 Instruction Model
src/z80/instruction.ts defines the instruction and operand types.
src/z80/parse-instruction.ts parses instruction text into that model.
src/z80/encode.ts turns the model into encoded fragments.
src/z80/effects.ts describes register and flag effects for register care.
The parser and encoder work as a pair:
| File | Question |
|---|---|
parse-instruction.ts |
Which instruction form did the source request? |
instruction.ts |
How is that form represented as TypeScript data? |
encode.ts |
Which bytes and fixup fragments represent that form? |
effects.ts |
Which registers and flags does that form read or write? |
The instruction model keeps overloaded Z80 mnemonics manageable. ld has many
forms, but the parser classifies operands before encoding. The encoder then
selects a specific opcode family from typed operands.
Fixups and Relative Branches
The Z80 encoder returns fragments: literal bytes, 8-bit immediates, 16-bit
immediates and relative branch targets. src/assembly/fixup-emission.ts
evaluates the expression attached to each fragment and writes the final byte or
word.
Relative branches such as jr and djnz emit an 8-bit displacement from the
next instruction. The parser recognises jr nz,Loop. The encoder knows the
opcode and that the operand is relative. Fixup emission knows the current
address and final target address.
The same principle applies to absolute branches and immediate operands. The encoder chooses the fragment width. Fixup emission evaluates the expression and checks that the value fits that width.
Source Segments
Every emitted byte range can carry source provenance. program-emission.ts
adds EmittedSourceSegment records with start address, end address, source
file, line and segment kind. writeD8m() later groups these segments by file
for Debug80.
Source segments classify emitted ranges as code, data, directive output, label context or unknown output. Debug80 uses this metadata to connect source lines to addresses. A byte-perfect assembler can still give a poor debugging experience when source segments are too broad, absent or attached to a different file.
Changing Assembly or Encoding
Instruction changes usually touch parse-instruction.ts, instruction.ts,
encode.ts and effects.ts. Assembly changes usually touch
address-planning.ts, program-emission.ts, fixup-emission.ts or
placement.ts, depending on whether the change affects facts, emitted bytes,
symbolic references or address movement.