Chapter 8 — Stack and Subroutines

Every program you have written so far is a single block of instructions that runs from top to bottom. That works for small tasks. But larger programs need to reuse logic — the same comparison, the same output routine, the same byte-copying sequence — called from a dozen different places. Without a way to jump away and come back, you would copy those instructions everywhere you needed them and maintain every copy separately.

call and ret solve this. This chapter explains how they work, how the hardware stack supports them and how to write subroutines that receive inputs through registers and hand results back to the caller.


Subroutines

Before the mechanics: a useful mental model. The Z80 has no hardware concept of a subroutine beyond call and ret. There is no special mode the CPU enters, no register tracking call depth, no difference between bytes at a call site and bytes inside a subroutine body. call label does two things — pushes an address, jumps. ret does one thing — pops that address back into PC. The subroutine abstraction is entirely a matter of discipline; the hardware enforces nothing.

That matters because any failure of discipline — an unbalanced push/pop, a missing ret, a jump that skips the cleanup — produces a silent wrong result with nothing to indicate where it went wrong. The mechanics below are what make that discipline reliable.


The call instruction

call label is a push of the return address followed by a jump — two operations in one opcode. Concretely:

  1. Pushes the address of the instruction following the call onto the hardware stack (this is the return address).
  2. Jumps to label.

The instruction is always 3 bytes long, so the return address pushed is always the address of the byte immediately after the call instruction.

The hardware stack is a region of RAM used as a last-in-first-out buffer. The stack pointer SP always holds the address of the most recently pushed value. When call pushes a word onto the stack, SP decreases by two (the stack grows downward in memory on the Z80). The return address is stored at the new SP.

After the call, the CPU is executing instructions inside the subroutine. The subroutine does not know which call site reached it.


The ret instruction

ret is equivalent to pop pc — if such an instruction existed. The CPU reads the top two bytes of the stack into the program counter, increments SP by two and execution resumes at the instruction after the original call.

If ret runs when the stack does not contain a valid return address — because of a push/pop mismatch, for example — the CPU jumps to whatever bytes are at the top of the stack, which is almost certainly not a valid instruction address.


The hardware stack

The stack is a region of RAM. You decide where it lives by loading SP with a starting address before the program uses any call, push or pop instructions. A common choice is the top of available RAM: ld sp, $BFFF (or whichever address marks the last byte of RAM on your target).

The stack grows downward. Each push decreases SP by two and writes a 16-bit value. Each pop reads two bytes and increases SP by two. SP always points to the most recent data pushed.

An AZM program must initialize SP before the first call. The standard first instruction in main is ld sp, top_of_ram for whatever address marks the last byte of available RAM on your board.


Passing values through registers

The Z80 has no hardware-enforced calling convention. Any register can carry any value into a subroutine or back out. The conventions used in these chapters are:

  • A carries a single byte result or input value.
  • HL carries a 16-bit result or input value.
  • BC and DE carry secondary input values.

Every subroutine should document which registers it reads on entry and which it modifies on exit. The assembler enforces nothing; the comment header is the contract between the subroutine and its caller.

; add_bytes: add two byte values.
; In:  B = first byte, C = second byte
; Out: A = B + C
; Preserves: BC, DE, HL
; Clobbers: AF
add_bytes:
  ld a, b
  add a, c
  ret

Preserves means those registers hold the same values after the call that they held before. Clobbers means the caller must not rely on those registers after the call. The only way to verify this contract is to read the subroutine body and check.

Every subroutine in AZM is a plain label followed by instructions, ending with ret. AZM inserts nothing automatically. If you forget ret, control falls through into whatever bytes follow the last instruction, which is almost always wrong.


push and pop: saving and restoring registers

push and pop are most clearly described in terms of virtual ld instructions.

push hl: SP is decremented by two, then the contents of HL are written to the two bytes at the new SP address — as if ld (sp), hl were an instruction. (ld (sp), hl is not an actual Z80 opcode, but that description captures the behaviour exactly.)

pop hl is the inverse: two bytes are read from the address in SP into HL, then SP is incremented by two — as if ld hl, (sp) were an instruction.

The operand can be any of AF, BC, DE, HL, IX or IY. Every push writes two bytes to RAM at SP; every pop reads them back. The stack tracks only position — not what the bytes mean or where they came from. All register pairs are saved to the same area of memory — the bytes at and below SP. The stack is the same RAM where your program and variables reside.

A subroutine uses push / pop to preserve registers it needs to modify internally. The pattern:

example:
  push bc          ; save caller's BC on entry
  ; ... use BC for internal work ...
  pop bc           ; restore caller's BC before returning
  ret

The critical rule: every push in a subroutine must have exactly one matching pop before the subroutine returns. If a subroutine pushes twice and pops once, the stack has an extra word on it when ret runs. ret will then read that extra word as the return address and jump to garbage.

Stack depth discipline: count your pushes and pops. They must be balanced.

Cross-register moves through the stack

You can push one pair and pop into a different pair — the stack has no memory of which register supplied the bytes it holds. This lets you perform register transfers that ld cannot express:

  push af         ; save AF onto the stack
  push bc         ; save BC onto the stack
  pop de          ; DE ← what was in BC
  pop hl          ; HL ← what was in AF

After these four instructions, DE holds the original value of BC and HL holds the original value of AF. The second transfer — AF into HL — is particularly useful, because there is no ld l, f instruction. The flags register F cannot appear in any ld combination, but it can be moved through the stack by pushing AF and popping into another pair. This is one of the few ways to inspect or transfer the flags register.

The stack is last-in-first-out: the pair pushed last is popped first. If you swap the pop order above, DE gets AF and HL gets BC — the reverse of what you might expect if you read the code top-to-bottom without thinking about the stack.


Shadow registers: saving state without the stack

In a tight interrupt handler or innermost loop, saving BC, DE and HL via push and pop costs six instructions — three pushes, three pops — and takes six bytes of stack space. exx does the same job in a single instruction: it swaps BC, DE and HL with a second hidden set of registers (BC′, DE′, HL′) simultaneously. A second instruction, ex af, af′, swaps A and F with their shadow counterparts.

These are the shadow registers — a second, hidden copy of A, F, B, C, D, E, H and L. You cannot use them directly in instructions; exx and ex af, af′ are the only way in.

The trade-off is that there is only one shadow set. If both your main code and an interrupt handler rely on exx, the interrupt can silently destroy the values the main code stored. push and pop work at any nesting depth; shadow registers do not. Use them when speed matters and you can guarantee that only one context uses them at a time.


Conditional return: ret cc

The Z80 also provides conditional return instructions: ret z, ret nz, ret c, ret nc and so on. ret z pops the return address and returns only if Z is set; otherwise it falls through to the next instruction.

This is useful for early-exit patterns:

check_nonzero:
  or a          ; test A for zero
  ret z         ; return immediately if A is zero
  ; ... rest of the subroutine runs only when A != 0 ...
  ret

ret z here is an early exit — it returns before the normal ret at the end. Both the early ret z and the final ret are explicit; AZM inserts nothing.

When using ret cc, the stack must be balanced at the conditional return point just as it must be at the final return. If the subroutine pushed a register before the test, it must pop before ret z as well.


Nested calls and stack depth

A subroutine can itself call another subroutine. Each call pushes another return address; each ret pops one. As long as every call is matched with a return, the stack remains balanced and execution returns correctly through each level.

The only limit is the size of the RAM region allocated to the stack. A program that calls too many levels deep — or forgets to pop before returning — will overwrite RAM used for other purposes. The Z80 has no hardware guard against stack overflow.


The example: learning/book1/examples/06_subroutines.asm

.org $8000
result_add: .db 0
result_max: .dw 0

The program has a main entry point and two helper subroutines.

add_bytes: the simplest subroutine.

; add_bytes: add two byte values.
; In:  B = first byte, C = second byte
; Out: A = B + C
; Preserves: BC, DE, HL
; Clobbers: AF
add_bytes:
  ld a, b
  add a, c
  ret

add_bytes reads B and C, adds them and leaves the result in A. The comment header documents that AF is clobbered and that A carries the result; the caller reads A after the call. The subroutine modifies only A, so BC, DE and HL are naturally preserved. The caller passes 20 in B and 10 in C:

  ld b, $14
  ld c, $0A
  call add_bytes        ; A = 30
  ld (result_add), a

After the call, result_add holds 30 ($1E).

max_word: push/pop for preservation.

; max_word: return the larger of two 16-bit values.
; In:  HL = first value, DE = second value
; Out: HL = larger value
; Preserves: DE
; Clobbers: AF
max_word:
  push de
  or a
  sbc hl, de
  pop de
  jr c, max_is_de
  add hl, de
  ret                ; HL held the original (larger) value
max_is_de:
  ex de, hl          ; HL < DE: put DE (the larger value) into HL
  ret

max_word receives two 16-bit values in HL and DE and returns the larger one in HL. The subroutine preserves DE with an explicit push/pop. AF is clobbered by or a and sbc hl, de; the comment header documents this.

The or a before sbc hl, de clears the carry flag. sbc hl, de subtracts DE from HL including the carry bit, so carry must be clear before the instruction for a pure 16-bit subtraction.

After sbc hl, de, the carry flag indicates the comparison result:

  • Carry clear — HL was greater than or equal to DE (no unsigned borrow). HL now holds original_HL - DE, which is not the result we want. add hl, de restores HL to its original value and the subroutine returns with that value.
  • Carry set — HL was less than DE (unsigned borrow occurred). DE is the larger value. ex de, hl puts DE into HL and the second ret returns.

The or a / sbc hl, de / add hl, de sequence is how you do an unsigned 16-bit comparison when you need the original HL back after the test. sbc hl, de is destructive; add hl, de undoes the subtraction when HL was the larger value.

The caller passes 80 ($0050) in HL and 200 ($00C8) in DE:

  ld hl, $0050
  ld de, $00C8
  call max_word         ; HL = $00C8
  ld (result_max), hl

After the call, result_max holds 200.

Stack balance in max_word. The subroutine has one push de and one pop de. The pop occurs before jr c, max_is_de, which means DE is restored regardless of which branch the conditional takes. The stack is clean for both ret paths.


An advanced trick: reading the program counter

The Z80 has no instruction to read PC directly. But because call pushes the address of the next instruction onto the stack, you can read PC with a trick:

  call next_instr       ; pushes address of next_instr onto the stack
next_instr:
  pop hl                ; HL = address of this instruction

call next_instr jumps to the very next instruction — it does nothing except push the return address. pop hl retrieves that address. HL now holds its own address in memory, which is the value PC had when pop hl was fetched.

No ret appears here, and that is fine. The only thing the stack requires is balance: call pushed one word, pop hl consumed it. The stack is clean. This trick demonstrates the freedom assembly gives you: call and ret are not ceremonial pairs that must always appear together. They are stack operations, and you can use them however the stack discipline permits.


Summary

  • call label pushes the return address onto the stack and jumps to label.
  • ret pops the return address and jumps to it, returning to the call site.
  • The hardware stack is a region of RAM. SP points to the most recently pushed word; the stack grows downward.
  • Initialize SP before the first call. The standard first instruction in an AZM program is ld sp, top_of_ram.
  • Pass values into a subroutine in registers: A for a single byte, HL for a 16-bit word, BC and DE for secondary values. Document which registers carry inputs, which carry outputs and which are clobbered.
  • Every subroutine in AZM must end with an explicit ret. AZM inserts nothing.
  • push rr saves a register pair; pop rr restores it. Every push must have a matching pop before the subroutine returns.
  • Unbalanced push/pop causes ret to jump to garbage, because the wrong bytes are at the top of the stack when ret reads the return address.
  • ret cc returns conditionally; the stack must be balanced at that point too.
  • The shadow registers (A′–L′) provide one-instruction save/restore via exx and ex af, af′, but only one context can use them safely at a time.
  • Subroutines can call other subroutines. Each call pushes a return address; each ret pops one. The stack depth grows with each nested call.

What Comes Next

Every program so far has been self-contained — loads constants, processes data in memory, stores a result. Chapter 9 breaks that boundary. The Z80 has a separate address space for hardware peripherals, and two instructions — in and out — are the only way to cross it. The subroutine structure from this chapter is exactly what peripheral drivers are built from.


Exercises

1. Stack trace. Work through these four instructions by hand, tracking the stack and register values at each step. Assume SP starts at $C000 and that the values in the registers before the sequence are: AF = $1234, BC = $5678.

push af
push bc
pop de
pop hl

After all four instructions: what is in DE? What is in HL? What is SP? (Remember: the stack is last-in-first-out — the pair pushed last is the first to be popped.)

2. Spot the push/pop mismatch. This subroutine has a stack-balance bug. Identify it and explain precisely what will happen when ret executes:

count_nonzero:
  push bc
  push de
  ld b, $08
  ld c, 0
count_loop:
  ld a, (hl)
  or a
  jr z, skip
  inc c
skip:
  inc hl
  djnz count_loop
  ld a, c
  pop bc
  ret

Write the corrected version.

3. Write a subroutine. Write a subroutine called double_byte that receives a byte value in B and returns B × 2 in A. Include a comment header that documents the inputs, outputs and which registers are clobbered. Then write the three lines in main that pass the value 15 to the subroutine, call it and store the result in a variable named doubled.

4. The or a / sbc hl, de pattern. The max_word subroutine uses or a immediately before sbc hl, de. Explain what or a does to the carry flag and why omitting it would produce wrong results. Then explain the add hl, de that follows on the carry-clear path — why is it needed and what does HL hold after sbc hl, de on that path?