Chapter 2 - Source Loading and Parsing
Source loading and parsing turn entry files into typed source items. This chapter follows the path from a filename to the structured data that assembly, tooling and register care consume.
The loading boundary lives in src/node/source-host.ts. The parser is
orchestrated by parseNextSourceItems() in src/core/compile.ts, with
single-line parsing in src/syntax/parse-line.ts and expression parsing in
src/syntax/parse-expression.ts.
Entry Files and Source Text
The public tooling and compile APIs enter loading through loadProgramNext() in
src/tooling/api.ts. That function calls expandSourceForTooling() and then
passes the expanded logical lines to parseNextSourceItems().
expandSourceForTooling() accepts:
export interface LoadProgramNextOptions {
readonly entryFile: string;
readonly includeDirs?: readonly string[];
readonly directiveAliasFiles?: readonly string[];
readonly preloadedText?: string;
readonly signal?: AbortSignal;
}
The entry file is normalised and checked for a source extension. AZM source
entries use .asm or .z80. preloadedText lets editor integrations parse an
unsaved buffer for the entry file while included files still come from disk.
signal lets an editor cancel stale work when a newer buffer arrives.
The loader keeps the full text of every loaded source file in sourceTexts.
Later stages use parsed source items for compiler logic, but several features
need original text:
- register-care annotation rewrites exact source lines
- tooling reads source text for diagnostics and code actions
- D8 map generation needs file names and line provenance
- case-style linting inspects original token case
Logical lines drive parsing. Source texts support tools that need to point back into the user’s files.
Include Expansion
.include is textual inclusion. The loader reads the entry file, scans it into
logical lines and recursively expands include directives. Include paths resolve
relative to the including source file first, then through configured include
directories.
That rule keeps library files portable. A library can include a sibling file and still assemble when the entry file is run from another directory. Include directories then act as project-level search paths for shared headers and vendor source.
The loader returns:
export interface ExpandedNextSource {
readonly entryFile: string;
readonly lines: readonly LogicalLine[];
readonly sourceTexts: ReadonlyMap<string, string>;
readonly sourceLineComments: ReadonlyMap<string, ReadonlyMap<number, string>>;
}
lines is the flattened source stream for parsing. sourceTexts keeps the
original file text. sourceLineComments keeps comments indexed by file and line
so register care can reconstruct AZMDoc contract blocks after routines have
been identified.
Logical Lines and Comments
src/source/logical-lines.ts scans a SourceFile into LogicalLine objects. A
logical line records the source name, line number and original text. This thin
structure gives every later diagnostic a stable location.
The source helpers are small and important:
| File | Role |
|---|---|
source-file.ts |
Wraps source text with a source name. |
logical-lines.ts |
Splits text into line records. |
source-span.ts |
Defines the common span shape. |
strip-line-comment.ts |
Removes semicolon comments while respecting quotes. |
strip-line-comment.ts is used by include recognition, layout parsing,
conditional assembly and single-line parsing. Shared comment handling prevents
each stage from inventing a slightly different rule for semicolons inside
strings and character literals.
Directive Aliases
Directive aliases are loaded during loadProgramNext():
const directiveAliasProfiles = await Promise.all(
(options.directiveAliasFiles ?? []).map((path) => readDirectiveAliasProfile(path)),
);
const directiveAliasPolicy = buildDirectiveAliasPolicy(directiveAliasProfiles);
src/syntax/directive-aliases.ts owns the alias policy. Built-in aliases and
project alias files are normalised before line parsing. The parser then
receives canonical directive forms and emits canonical source items.
Aliases are a syntax boundary. They affect directive recognition before parsing. The assembler-time model receives canonical source items.
Source Items
The parser is the first place where AZM source becomes compiler data. Before this point, a line is text with a file name and line number. After this point, a line is a label, instruction, directive, layout declaration or comment item.
src/model/source-item.ts defines the parser output. The model includes:
- labels
.org,.equ,.db,.dw,.ds,.align, string directives and.end- instructions
- record and union layout declarations
- type aliases
- enums
- op-expanded items
- comments
Each item carries a source span where appropriate. Assembly uses item kind to decide size and emission. Register care uses instruction, label and comment items to build routines. D8 map output uses spans to connect emitted bytes back to files and lines.
Top-Level Parse Order
parseNextSourceItems() handles structural forms before ordinary line parsing:
- Conditional assembly filters the logical line stream.
collectOps()records top-levelopdefinitions and marks their body lines.- Name-left
.typealiasdeclarations are parsed. - Record and union headers collect
.fielddeclarations until.endtypeor.endunion. - Visible op invocations expand into ordinary source items.
parseLogicalLine()handles single-line labels, directives, data and instructions.
This order matters. Ops must be collected before invocation expansion. Layout declarations must collect their body lines as one source item. Ordinary instruction parsing should see the lines that remain after those structural forms have been handled.
Layout and Declaration Parsing
Name-left layout syntax is parsed in parseNextSourceItems() because a record
or union body spans multiple lines:
Sprite .type
x .field byte
y .field byte
tile .field byte
flags .field byte
.endtype
Fields are parsed as LayoutField values. Each field has a name and a type
expression. The parser checks declaration shape. address-planning.ts later
checks duplicate field names, layout size and type references.
Type aliases are parsed as named bindings:
SpriteArray .typealias Sprite[16]
The parser stores the alias target as a type expression. Assembly resolves the target against scalar layout names, record names, union names and other type aliases.
The parser also distinguishes address labels from declarations. An address label uses a colon and becomes a label item. Name-left declarations become equate, enum, type, union or type-alias items.
Start:
ret
COUNT .equ 8
A label contributes an address based on placement. An equate contributes an assembler-time value based on expression evaluation.
Expressions and Conditionals
src/syntax/parse-expression.ts parses numeric expressions, names, unary and
binary operators, function calls, layout casts and type expressions. It is used
by .equ, data directives, instruction operands, layout functions, .ds and
layout fields.
The parser produces expression trees from src/model/expression.ts.
src/semantics/expression-evaluation.ts evaluates those trees when the
assembler-time environment is available.
Conditional assembly is handled before final line parsing. The conditional pass keeps the active lines and removes inactive branches from the stream seen by later stages. Ordinary parsing then receives one effective source program.
Parse Diagnostics
src/syntax/parse-diagnostics.ts contains shared helpers for syntax errors.
Diagnostic IDs come from src/model/diagnostic.ts. Use those helpers when
adding parse failures so source positions, severity and code shape stay
consistent.
Parser recovery matters for editor tooling. A user may have a half-written line while typing. Tooling still needs symbols, diagnostics and register-care hints for surrounding source, so parse errors should usually report a diagnostic and let parsing continue.