Skip to content

Compiler Architecture

wirespec is a hand-written compiler implemented in Rust. There are no parser generators, no template engines, and no external dependencies beyond standard Rust crates.

Pipeline Overview

Source (.wspec)
  → AST              (wirespec-syntax)
  → Semantic IR      (wirespec-sema)
  → Layout IR        (wirespec-layout)
  → Codec IR         (wirespec-codec)
  → C Code           (wirespec-backend-c)
  → Rust Code        (wirespec-backend-rust)

Each stage transforms the representation into something progressively closer to machine output. Backends (C, Rust) consume Codec IR — they never touch the AST directly.

Crate Structure

wirespec/
├── Cargo.toml              # Workspace root
├── crates/
│   ├── wirespec-syntax/    # Lexer + Parser → AST
│   ├── wirespec-sema/      # Semantic analysis → Semantic IR
│   ├── wirespec-layout/    # Wire shape → Layout IR
│   ├── wirespec-codec/     # Parse/serialize strategy → Codec IR
│   ├── wirespec-backend-api/ # Backend trait + contracts
│   ├── wirespec-backend-c/ # C code generation
│   ├── wirespec-backend-rust/ # Rust code generation
│   └── wirespec-driver/    # Module resolver + CLI
├── examples/               # .wspec/.wspec protocol definitions
├── docs/                   # Design documents and plans
└── protospec/              # Python reference implementation (legacy)
CrateDescription
wirespec-syntaxHand-written lexer + recursive descent parser, AST node types, span tracking
wirespec-semaSemantic analysis: name resolution, type checking, validation rules
wirespec-layoutLayout lowering: wire field ordering, bit group packing, endianness
wirespec-codecCodec lowering: parse/serialize strategies, zero-copy decisions, capacity checks
wirespec-backend-apiBackend trait definitions (Backend, BackendDyn, ArtifactSink, checksum bindings)
wirespec-backend-cC code generator: header + source, bitgroup shift/mask, checksum verify/compute
wirespec-backend-rustRust code generator: single .rs file, lifetime tracking, Rust enums for frames
wirespec-driverCompilation driver: module resolution, dependency graph, multi-module pipeline, CLI binary

Stage Descriptions

Parser (wirespec-syntax)

The lexer and recursive descent parser are implemented as a single crate. Each grammar production maps to one parse method. The result is an AST — a direct structural echo of the source text. No name resolution happens here. Field references in expressions are just name strings at this stage.

Semantic Analyzer (wirespec-sema)

The largest crate in the compiler. It takes an AST and produces a SemanticModule with:

  • All names resolved to their definitions
  • Option[T] applied to conditional fields
  • Wire types assigned to every field
  • Typed semantic expressions replacing raw AST expression nodes
  • State machine validation (reachability, complete action coverage)
  • require constraints checked for type correctness

This is the last stage that understands the language's meaning. Downstream passes treat it as ground truth.

Layout Pass (wirespec-layout)

Translates semantic types into wire byte geometry:

  • Endianness resolved per field (from type suffix like u16le, from @endian module annotation, or from a type alias chain)
  • Consecutive bits[N] fields grouped into a single read operation — a BitGroup that is then shift-and-masked into individual fields
  • No code generation logic here — purely descriptive

Codec Pass (wirespec-codec)

Assigns a field strategy to every field:

  • Primitive — read N bytes, apply endian conversion
  • VarInt — prefix-match variable-length integer
  • ContVarInt — continuation-bit variable-length integer (MQTT-style)
  • BytesFixed — zero-copy byte slice
  • BytesLength — length-prefixed byte slice
  • BytesRemaining — consume scope remainder
  • Array — loop over element count from prior field
  • BitGroup — single read + shift/mask
  • Struct — nested struct parse call
  • Conditionalif COND { T } optional field
  • Checksum — verify on parse, compute on serialize

C Code Generator (wirespec-backend-c)

Consumes CodecModule and emits .h + .c files. Every generated function follows the same contract:

c
wirespec_result_t PREFIX_parse(
    const uint8_t *buf, size_t len,
    PREFIX_t *out, size_t *consumed);

wirespec_result_t PREFIX_serialize(
    const PREFIX_t *in,
    uint8_t *buf, size_t cap, size_t *written);

size_t PREFIX_serialized_len(const PREFIX_t *in);

No heap allocation. All buffers are caller-provided. This invariant is enforced during code review and tested by compiling with -Wall -Wextra -Werror.

Rust Code Generator (wirespec-backend-rust)

Consumes CodecModule and emits a single .rs file. Uses the same structured emitter approach as the C backend. The Rust backend is fully implemented and covered by codegen and end-to-end tests.

Module Resolver (wirespec-driver)

Handles multi-file compilation:

  1. Parse the entry .wspec file, collect import declarations
  2. Locate each imported module on disk (respects -I include paths)
  3. Recursively parse imported modules
  4. Detect cycles (error if found)
  5. Return modules in topological order (dependencies first)

The compiler then processes modules in that order, collecting exported types for use in downstream modules.

Multi-Module Compilation Flow

Entry .wspec file
  └─ wirespec-driver: find + parse all imports (depth-first, topo sorted)
       └─ For each module (dependencies first):
            wirespec-syntax    → AST
            wirespec-sema      → Semantic IR (with imported types injected)
            wirespec-layout    → Layout IR
            wirespec-codec     → Codec IR
            wirespec-backend-c → .h + .c        (-t c)
            wirespec-backend-rust → .rs          (-t rust)

Downstream modules receive the exported types from upstream modules before their own semantic analysis runs. This is how import quic.varint.VarInt makes VarInt available as a named type in quic.frames.

Running Tests

bash
# All tests (933+ across 8 crates)
cargo test --workspace

# Run tests for a specific crate
cargo test -p wirespec-sema

# Build the compiler
cargo build --release

# Compile a .wspec file
./target/release/wirespec compile examples/quic/varint.wspec -t c -o build/

# Compile and test generated C
cd build && gcc -Wall -Wextra -Werror -O2 -std=c11 \
    -o test_varint quic_varint.c tests/test_varint.c && ./test_varint

Design Principles

  • No heap allocation in generated C. All buffers are caller-provided. Generated code uses stack and zero-copy views only.
  • Generated code compiles warning-free under gcc -Wall -Wextra -Werror -std=c11.
  • Backends consume Codec IR only — all name resolution and type checking is complete before code generation.