Chapter 11 — Subroutine Conventions
Chapter 10’s two subroutines worked. find_max received HL and B, returned A. count_above received HL, B and C, returned A. Both got the right answers. But count_above used D as an internal running counter and clobbered it on exit — and nothing in the code said so. A caller that had a value in D before the call would find it gone afterward, with no warning and no error.
That is the problem this chapter names: in flat Z80 assembly, subroutines communicate their interface and side effects only through discipline and comments. Nothing else exists. This chapter describes that discipline — the conventions that make subroutines safe to call, read and modify.
The register-passing convention
Z80 subroutines pass arguments in registers. There is no other mechanism at the machine level. The convention for which registers carry which kinds of values is informal, but widely followed:
- HL carries a 16-bit address or pointer — the start of a table, a buffer, a string.
- BC carries a 16-bit count or value — loop counts, word quantities.
- B alone carries an 8-bit count when only one is needed.
- C carries a single-byte argument when something other than the count is needed.
- DE carries a second address — most commonly a destination when HL is the source.
- A carries a single byte that needs a fast path — a byte value, a flag, a character.
Return values follow a matching convention:
- A carries a byte result.
- HL carries a 16-bit result — an address, a computed word.
These are not enforced by the assembler. They are agreements between the writer of a subroutine and the writers of its callers. When everyone follows the same convention, reading a call site tells you what is going in and what is coming out. When the convention is violated or misunderstood, the caller gets garbage.
Callee-save and caller-save registers
Every subroutine touches at least a few registers. The question is whether the caller can rely on those registers being unchanged after the call.
The convention divides registers into two groups.
Caller-save registers are registers the caller accepts may be destroyed by the call. A, F and any register the caller explicitly passes as an argument fall into this category. The caller is responsible for saving anything in those registers that it still needs — before the call, not after.
Callee-save registers are registers the subroutine must restore before it returns, if it uses them internally. BC, DE, HL, IX and IY are callee-save. If a subroutine uses any of those as scratch storage, it must push them at entry and pop them before returning.
The mechanism is push and pop:
my_routine:
push bc
push de
; ... body that uses BC and DE internally ...
pop de
pop bc
ret
The pops mirror the pushes in reverse order. The stack is LIFO — last in, first out — so the last register pushed must be the first popped. Getting this order wrong swaps the values back into the wrong registers. The assembler does not catch it.
find_max from Chapter 10 is clean on this front: it only uses HL and B, both of which are its inputs. Nothing else gets touched. But count_above uses D internally as the running counter. D is callee-save. A caller that kept something in D before calling count_above would lose it.
The fix: push and pop DE around the body.
count_above:
push de ; save caller's DE (D used internally as counter)
ld d, 0 ; D = running count
CountAboveLoop:
ld a, (hl)
cp c
jr c, CountAboveSkip ; A < threshold: skip
jr z, CountAboveSkip ; A = threshold: skip (strictly above only)
inc d
CountAboveSkip:
inc hl
djnz CountAboveLoop
ld a, d ; return count in A
pop de ; restore caller's DE
ret
The push at the top saves whatever the caller had in DE. The pop at the bottom restores it. The caller’s D and E values are the same after the call as before. The fact that count_above used D internally is invisible to the caller.
One timing issue: the pop must appear on every return path. A subroutine that has multiple exit points needs a pop on each one. Missing a pop on one path leaves the stack misaligned, and the eventual ret will jump to whatever garbage value ended up at the stack pointer.
The IX frame for local storage
Register passing works for a small number of arguments. When a subroutine needs more temporary storage than the remaining free registers can provide, the stack is the answer.
The technique uses IX as a base pointer into the stack. The subroutine allocates a block of bytes on the stack at entry, accesses them through IX-relative addressing and deallocates the block before returning.
The prologue establishes the frame:
my_routine:
push ix ; save caller's IX
ld ix, 0
add ix, sp ; IX now points to the frame base (top of stack)
After these three instructions, IX holds the current stack pointer. The two bookkeeping entries are already on the stack:
higher addresses
┌────────────────────────────────────┐
│ saved IX high byte IX+1 │
│ saved IX low byte IX+0 ← IX │ frame base
├────────────────────────────────────┤
│ return address high IX+3 │ pushed by CALL
│ return address low IX+2 │ pushed by CALL
│ ... (caller's stack below) │
└────────────────────────────────────┘
lower addresses
If the caller pushed arguments onto the stack before the call, they sit above the return address:
│ arg high byte IX+5 │ ← pushed by caller
│ arg low byte IX+4 │ ← pushed by caller
│ return address high IX+3 │
│ return address low IX+2 │
│ saved IX high IX+1 │
│ saved IX low IX+0 ← IX │ frame base
Arguments pushed by the caller appear at IX+4 and above. You never read IX+0 through IX+3 directly — those slots belong to the bookkeeping.
To allocate local storage, decrement SP once per byte needed:
dec sp
dec sp ; allocate 2 bytes of local storage
The two bytes are now at IX−1 and IX−2. Access them with indexed addressing:
ld (ix-1), a ; write first local
ld a, (ix-2) ; read second local
The epilogue undoes both steps and restores IX for the caller:
ld sp, ix ; restore SP to frame base (discards locals)
pop ix ; restore caller's IX
ret
The ld sp, ix line removes all local storage in one instruction, regardless of how many bytes were allocated. No matching inc sp sequence is needed.
This is the same IX-relative addressing you learned for table indexing. Inside a framed subroutine, IX holds the frame base instead of a table base. The instruction form is identical; only the purpose changes.
A caution: the index displacement in (ix+d) is a signed 8-bit value. For locals, d is negative (−1 through −128). For caller-pushed args, d is positive (4 through 127). The maximum frame size is 128 bytes of locals and 124 bytes of arguments. For most subroutines this is more than enough.
Register documentation
The only way to communicate a subroutine’s register interface in plain assembly is a comment block. Nothing else runs at assembly time.
The comment block lives immediately before the subroutine label and declares every input, every output and every register the subroutine leaves changed:
; find_max: scan a byte table and return the largest value
; In: HL = pointer to first byte of table
; B = number of bytes to scan
; Out: A = maximum value found
; Clobbers: B (reaches 0 after the loop), HL (advances past last byte)
find_max:
ld a, 0
FindMaxLoop:
cp (hl)
jr nc, FindMaxSkip
ld a, (hl)
FindMaxSkip:
inc hl
djnz FindMaxLoop
ret
Clobbers lists every register the caller should not rely on after the call. find_max destroys both B and HL in normal operation — B counts down to zero via djnz, and HL walks through the table. Any caller that needs the original B or HL after the call must save them first.
The comment block for count_above with push/pop discipline:
; count_above: count bytes in a table that are strictly above a threshold
; In: HL = pointer to first byte of table
; B = number of bytes to scan
; C = threshold value
; Out: A = count of bytes where (byte > threshold)
; Clobbers: B (reaches 0), HL (advances past last byte)
; Preserves: C, D, E (DE saved via push/pop)
count_above:
push de
ld d, 0
CountAboveLoop:
ld a, (hl)
cp c
jr c, CountAboveSkip
jr z, CountAboveSkip
inc d
CountAboveSkip:
inc hl
djnz CountAboveLoop
ld a, d
pop de
ret
Preserves lists registers the subroutine explicitly restores. Declaring Preserves: C, D, E tells callers that DE is safe across the call even though count_above uses D internally.
The problem is that these comments have no enforcement. A wrong comment, a callee that was modified after the comment was written, a caller that misread the convention — all fail silently. The assembler passes the code. The CPU runs it. The bug appears at runtime, sometimes far from its origin.
Chapter 12 shows what AZM provides beyond comments: a structured declaration syntax that the register-care analyzer can read and verify.
A worked example: the complete pair
Here are both subroutines from Chapter 10 with full push/pop discipline and complete comment blocks.
; find_max: scan a byte table and return the largest value
; In: HL = pointer to first byte
; B = count (number of bytes to scan)
; Out: A = maximum value found
; Clobbers: B (reaches 0 after djnz), HL (points past last byte)
; Preserves: C, D, E, IX, IY
find_max:
ld a, 0
FindMaxLoop:
cp (hl)
jr nc, FindMaxSkip
ld a, (hl)
FindMaxSkip:
inc hl
djnz FindMaxLoop
ret
find_max uses only its input registers and A. Nothing else is touched, so nothing else needs push/pop. The clobber list accurately reflects what the caller loses.
; count_above: count bytes in a table strictly above a threshold
; In: HL = pointer to first byte
; B = count (number of bytes to scan)
; C = threshold value (bytes must be strictly greater to count)
; Out: A = number of bytes where byte > threshold
; Clobbers: B (reaches 0 after djnz), HL (points past last byte)
; Preserves: C, D, E (DE saved via push/pop)
count_above:
push de ; D used as counter; save caller's DE
ld d, 0
CountAboveLoop:
ld a, (hl)
cp c ; compare byte against threshold
jr c, CountAboveSkip ; A < C: skip (carry set = unsigned less-than)
jr z, CountAboveSkip ; A = C: skip (zero set = equal, not above)
inc d ; A > C: increment counter
CountAboveSkip:
inc hl
djnz CountAboveLoop
ld a, d ; move count from D into A for return
pop de ; restore caller's DE before returning
ret
The structure is: save anything the caller might need, do the work, restore before returning. The caller of count_above can keep a value in DE across the call and trust it will be intact — as long as the comment is correct.
The main sequence that calls both:
main:
ld hl, values
ld b, 8
call find_max
ld (max_val), a
ld hl, values ; reload HL — find_max walked it to the end
ld b, 8 ; reload B — find_max consumed it
ld c, 64
call count_above
ld (above_64), a
ret
The two reloads before count_above are not optional. find_max clobbered HL and B — the comment says so, and the code confirms it. Every caller of find_max must either not need HL and B afterward, or reload them.
Summary
- The informal Z80 calling convention passes addresses in HL, counts in B or BC, single bytes in A or C and a second address in DE. Byte results return in A; word results return in HL.
- Callee-save registers (BC, DE, HL, IX, IY) must be pushed at entry and popped before return if the subroutine uses them as scratch storage. A and F are caller-save.
- Pops must mirror pushes in reverse order. Every return path needs the matching pop sequence, or the stack alignment breaks and
retjumps to the wrong address. - The IX frame provides local storage on the stack. The prologue saves IX, sets IX = SP and allocates bytes with
dec sp. Locals sit at negative IX offsets. The epilogue restores SP withld sp, ixand pops IX. - If the caller pushes arguments onto the stack before the
call, they sit at IX+4 and above after the prologue. - A comment block declaring inputs, outputs, clobbers and preserved registers is the only documentation mechanism in plain assembly. Nothing verifies it.
Exercises
1. Trace push/pop order. A subroutine has this entry sequence:
push bc
push hl
push af
Write the correct epilogue (three pops in the right order). Then explain what happens if the order is reversed.
2. Identify what to save. A subroutine receives HL as an input table pointer and B as a byte count. Internally, it uses C and D as scratch and E as a second counter. Which registers need push/pop discipline? Which do not? Write the push sequence at entry and the matching pop sequence at exit.
3. Build an IX frame. Write the prologue and epilogue for a subroutine that needs four bytes of local storage. Use (ix-1) through (ix-4) for the locals. Then write the two instructions that write the value 42 into the first local and read it back into A.
4. Spot the bug. The following subroutine has a return path that misses a pop:
sum_bytes:
push bc
ld c, 0 ; C = running sum
SumBytesLoop:
ld a, (hl)
add a, c
ld c, a
inc hl
djnz SumBytesLoop
ld a, c
pop bc
ret
If b is loaded with 0 before the call, djnz will execute 256 times (the Z80’s zero-count behaviour). Suppose instead that a separate error path is added that returns early when a zero byte is found:
ld a, (hl)
or a
jr z, SumEarlyExit ; found zero, abort
add a, c
ld c, a
inc hl
djnz SumBytesLoop
ld a, c
pop bc
ret
SumEarlyExit:
ld a, 0
ret ; BUG: missing pop
Explain exactly what happens to the caller’s BC and to the stack when the early exit fires. Write the corrected version.