Skip to content

The 4-Stage IR Pipeline

The wirespec compiler uses four intermediate representations (IRs), each progressively closer to machine output. This separation keeps language semantics independent from code generation, making it possible to add new backends without touching the parser or type checker.

This page is for contributors who want to understand or modify compiler internals.

Why Four Stages?

The compiler is organized as a chain of IRs that become progressively lower-level. This allows the language semantics to be independent of the code generation target.

In practice this means:

  • AST = what the programmer wrote (syntax)
  • Semantic IR = what it means (semantics, backend-agnostic)
  • Layout IR = how bytes are arranged on the wire
  • Codec IR = how to parse/serialize those bytes (backend-specific)

A Rust backend and a C backend share the first three stages. They diverge only at Codec IR, where target type names differ (uint32_t vs u32) and backend-specific idioms apply.

Data Flow

AST ─── wirespec-sema ──→ Semantic IR

                          wirespec-layout


                          Layout IR

                          wirespec-codec


                          Codec IR

                ┌───────────────┴───────────────┐
        wirespec-backend-c            wirespec-backend-rust
                │                               │
                ▼                               ▼
      .h + .c output files              .rs output file

Stage 1: AST

Produced by: wirespec-syntax crate Key types: WireFile, PacketDef, FrameDef, CapsuleDef, TypeDef, StateMachineDef, Expr

The AST is a direct structural representation of the source text. Nothing is resolved — field references in expressions are bare name strings, type references are unresolved identifiers.

Key types:

TypeDescription
WireFileTop-level container: module decl, imports, definitions
PacketDefpacket Foo { ... } definition
FrameDefframe Foo = match tag: T { ... } definition
CapsuleDefcapsule Foo { ... within ... } definition
TypeDeftype Foo = ... (alias or computed type)
StateMachineDefstate machine Foo { ... }
ExprExpression AST nodes: NameExpr, LiteralExpr, BinaryExpr, CoalesceExpr, etc.

The AST is syntax-directed and must not be used directly by backends. Once semantic resolution is needed, backends must consume Semantic IR or below.

Stage 2: Semantic IR

Produced by: wirespec-sema crate Key types: SemanticModule, SemanticStruct, SemanticFrame, StateMachine, SemanticVarInt, SemField

This is the first stage where the program's meaning is fully established. Every name is resolved. Every type is known.

WireType Enum

Every field gets a WireType variant:

VariantMeaning
U8, U16, U24, U32, U64Unsigned integers
I8, I16, I32, I64Signed integers
VARINTPrefix-match variable-length integer
CONT_VARINTContinuation-bit variable-length integer
BOOLSemantic boolean (derived fields / guards only)
BYTESByte sequence (fixed, length-prefixed, or remaining)
BITSSub-byte field (bits[N])
BITSingle-bit field
ARRAYHomogeneous array [T; count]
STRUCTNamed packet/frame/capsule reference
ENUMNamed enum reference
FLAGSNamed flags reference

SemExpr Replaces AST Expr

Raw Expr nodes are replaced by typed SemExpr variants:

SemExprDescription
SemFieldRefField reference with resolved type
SemLiteralInteger, string, bool literal
SemBinaryOpBinary operation with typed operands
SemCoalesce?? coalesce (Option[T] + default)
SemInStatein_state(S) predicate for state machine guards
SemAllall(collection, predicate) quantifier
SemSlicefield[start..end] half-open slice
SemSubscriptfield[index] array subscript

Option[T] Tracking

Conditional fields (if COND { T }) are typed as Option[T] in the Semantic IR. Any expression that references such a field must use ?? (coalesce) or be inside a guard — this is enforced here.

State Machine Validation

State machine definitions are validated in this stage:

  • All states referenced in transitions exist
  • All events handled in on clauses have consistent parameter types
  • action blocks fully initialize dst fields that have no default value
  • delegate and action do not coexist in the same transition

Key Rust Types

rust
SemanticModule     // One per .wspec file: structs, frames, state machines, imports
SemanticStruct     // A packet or frame branch: name, fields, constraints
SemanticFrame      // A tagged union: tag field + list of SemanticStruct branches
StateMachine       // State machine: states, transitions, initial state
SemanticVarInt     // Computed type (prefix-match or continuation-bit)
SemField           // A single field with WireType, name, optional SemExpr condition

Stage 3: Layout IR

Produced by: wirespec-layout crate Key types: LayoutModule, LayoutField, BitGroup, Endianness

Layout IR describes wire byte geometry. It answers: in what order do the bytes appear, and how are bits packed?

Endianness Resolution

Each field gets an Endianness value (Big, Little, None):

  1. Explicit type suffix wins: u16leLittle
  2. Module-level @endian annotation is the fallback
  3. Type aliases are chased to their underlying type

BitGroup Collapsing

Consecutive bits[N] fields (and bit fields) in a packet, frame branch, or capsule body are collapsed into a single BitGroup. For example:

wire
packet IPv4Header {
    version: bits[4],
    ihl: bits[4],
    dscp: bits[6],
    ecn: bits[2],
    ...
}

The first two fields become one BitGroup reading 1 byte. The next two become another BitGroup reading 1 byte. The C backend then emits a single uint8_t read and shift/mask extractions.

Key Rust Types

rust
LayoutModule      // Layout wrapper for a full module
LayoutField       // Single field with Endianness
BitGroup          // Grouped consecutive bits[N] fields + total byte width
Endianness        // Big | Little | None

Layout IR is deliberately free of code generation concerns. It says nothing about uint32_t or function signatures — that is Codec IR's job.

Stage 4: Codec IR

Produced by: wirespec-codec crate Key types: CodecModule, CodecStruct, CodecFrame, CodecField, FieldStrategy

Codec IR is the final backend-agnostic representation. It assigns a FieldStrategy and target type to every field.

FieldStrategy

StrategyWhat it represents
PrimitiveRead N bytes, apply endian conversion
VarIntDecode prefix-match variable-length integer
ContVarIntDecode continuation-bit variable-length integer
BytesFixedZero-copy byte slice (pointer + length)
BytesLengthLength-prefixed byte slice
BytesRemainingConsume all remaining bytes in current scope
ArrayLoop: read count field, parse N elements
BitGroupSingle read + per-field shift/mask
StructNested struct parse/serialize call
ConditionalEvaluate condition, parse if true
ChecksumVerify on parse, auto-compute on serialize
Derivedlet field — computed from other fields, not on wire
Constraintrequire expression — runtime check only

MemoryTier

The three-tier memory model:

TierExampleStrategy
Abytes[length]Zero-copy: pointer + length view into the input buffer
B[u16le; N]Materialized: memcpy + byte-swap into fixed array
C[AckRange; N]Materialized: parse each element into pre-allocated struct array

Key Rust Types

rust
CodecModule       // Full module codec representation
CodecStruct       // One struct (packet or frame branch) with codec fields
CodecFrame        // Tagged union with tag codec + branch list
CodecField        // One field: strategy, wire_type, layout ref, semantic ref
FieldStrategy     // Enum of parse/serialize strategies (see above)

Using the Pipeline Programmatically

The wirespec-driver crate provides the entry point for full-pipeline compilation:

rust
use wirespec_driver::{compile, CompileRequest};

let result = compile(&CompileRequest {
    entry: "examples/quic/varint.wspec".into(),
    include_paths: vec!["examples/".into()],
    profile: wirespec_sema::ComplianceProfile::default(),
});

match result {
    Ok(result) => {
        for module in &result.modules {
            // module.codec is the CodecModule ready for backend consumption
            println!("Module: {}", module.module_name);
        }
    }
    Err(e) => eprintln!("error: {e}"),
}

For single-module compilation without import resolution:

rust
use wirespec_driver::compile_module;

let source = std::fs::read_to_string("examples/net/udp.wspec").unwrap();
let compiled = compile_module(
    &source,
    wirespec_sema::ComplianceProfile::default(),
    &Default::default(),
).unwrap();
// compiled.codec is the CodecModule

Debugging the Pipeline

To inspect each stage during development:

rust
use wirespec_syntax;
use wirespec_sema;
use wirespec_layout;
use wirespec_codec;

let source = std::fs::read_to_string("examples/quic/varint.wspec").unwrap();

let ast = wirespec_syntax::parse(&source).unwrap();
// Inspect AST nodes

let sem = wirespec_sema::analyze(
    &ast,
    wirespec_sema::ComplianceProfile::default(),
    &Default::default(),
).unwrap();
// Inspect SemanticModule: sem.packets, sem.frames, etc.

let layout = wirespec_layout::lower(&sem).unwrap();
// Inspect LayoutModule

let codec = wirespec_codec::lower(&layout).unwrap();
// Inspect CodecModule: codec.packets, codec.frames, etc.

Each IR is a plain Rust struct — Debug is implemented for inspection.