Patternia Performance - v0.8.2¶

Published: 2026-03-03

Focus:

establish the lowering engine as the main optimization architecture
push static literal dispatch closer to the switch baseline
separate dispatch quality from matcher-construction overhead

Summary¶

v0.8.2 moves Patternia's performance work from isolated fast paths to a general lowering engine.

This version focuses on one practical target:

make keyed pattern chains behave as closely as possible to hand-written switch code
keep the optimization scalable beyond one special-case pattern family
reduce the matcher-construction overhead that appears in match(x) | on(...)

The result is a two-part improvement:

the compiler-visible static literal path is now close to the switch baseline
the pipeline syntax can reuse a cached matcher through static_on(...) or PTN_ON(...)

The benchmark discussion in this note uses the 128-way literal scenario because it is large enough to expose both sides of the problem:

how close lowering gets to hand-written control flow
how expensive raw inline matcher construction still is

Benchmark Snapshot¶

Raw pipeline form vs `switch`¶

This chart answers the simple question: what happens if you write the direct pipeline form and call it in a hot loop?

LiteralMatch128 raw on vs switch

Cached matcher forms vs `switch`¶

This chart isolates the same 128-way static-literal workload after matcher construction is moved out of the hot path.

LiteralMatch128 cached matcher vs switch

Problems We Needed To Solve¶

Problem 1: Runtime literal factories hide the dispatch key¶

lit(value) is a general runtime factory.

That is the right API for dynamic values and strings, but it means the lowering stage cannot see the literal as a compile-time key:

match(x) | on(
  lit(1) >> 1,
  lit(2) >> 2,
  __ >> 0
);

For this shape, Patternia can still build a specialized linear literal path, but it cannot reliably target switch-like lowering.

Problem 2: `match(x) | on(...)` pays matcher construction cost every call¶

Even when the match logic itself lowers well, the inline pipeline form still constructs the whole matcher object on every evaluation:

match(x) | on(
  lit<1>() >> 1,
  lit<2>() >> 2,
  __ >> 0
);

For large case sets, this construction cost dominates the dispatch cost.

Problem 3: Optimization logic did not scale¶

Before v0.8.2, performance logic was drifting toward isolated special cases in optimize.hpp and eval.hpp.

That approach does not scale for future keyed scenarios such as:

static literals
simple variant alternative dispatch
guarded variant cases that still expose a primary key

Patternia needed a shared lowering model, not one more fast path.

Initial Solution¶

The first step was to expose a compile-time literal form:

lit<42>()

This gives the lowering stage a stable key that can move into type-level metadata and plan selection.

The second step was to validate the idea with a narrow benchmark:

128 static literal cases
one trailing wildcard default
direct comparison against a hand-written switch

That experiment confirmed two separate facts:

static keyed lowering works
raw match(x) | on(...) still loses badly if matcher construction remains in the hot path

That is why v0.8.2 treats lowering and matcher reuse as separate concerns.

Lowering Engine Architecture¶

v0.8.2 organizes the optimizer around a small internal pipeline:

case DSL
  -> case_analysis
  -> case_sequence_ir
  -> lowering_analysis
  -> dispatch_plan
  -> eval_cases_with_dispatch_plan(...)

The main pieces are:

Component	Responsibility
`case_analysis`	Analyze one case into discriminator, residual work, and binding shape
`case_sequence_ir`	Aggregate the full case chain and decide whether lowering is legal
`lowering_analysis`	Expose legality and plan-family signals to the planner
`dispatch_plan`	Carry the concrete execution shape selected for the evaluator
`eval_cases_with_dispatch_plan(...)`	Execute the chosen plan without re-running planner logic

The design is built around three legality grades:

Rule	Meaning	Current result
`full`	The key alone can drive the dispatch path	direct lowering
`bucketed`	The key narrows the search, but some residual replay remains	per-key bucket replay
`none`	The planner cannot prove a safe keyed path	sequential evaluation

In code, this lives in:

Algorithms and Principles¶

1. Dense static-literal lowering¶

This is the main new full path in v0.8.2.

Patternia first checks whether the case chain is a good dense-key candidate. The current heuristics are:

subject type is integral or enum
every case is a zero-bind static literal or the trailing wildcard default
case count is at most 256
key span is at most 512
key span is dense enough relative to case count

If those checks pass, the planner selects:

dispatch_plan_kind::static_literal_dense

The metadata computes:

min_value
max_value
range_size
default_case_index
compile-time case_index_table

At runtime, the hot path becomes:

value
  -> range check
  -> offset = value - min_value
  -> switch(offset)
  -> selected case entry or default

Example: sparse values inside a small span¶

For:

match(x) | on(
  lit<1>()  >> 1,
  lit<2>()  >> 2,
  lit<10>() >> 10,
  __        >> 0
);

Patternia computes:

min_value = 1
max_value = 10
range_size = 10

The dense metadata conceptually maps offsets like this:

Value	Offset	Selected branch
`1`	`0`	case `lit<1>()`
`2`	`1`	case `lit<2>()`
`3..9`	`2..8`	default
`10`	`9`	case `lit<10>()`

This is why the generated path is much closer to switch than a generic pattern replay loop.

2. Variant keyed dispatch with bucket replay¶

The lowering engine is not literal-only.

Variant dispatch is now expressed through the same planner structure:

variant_simple
variant_alt_bucketed

The full case is straightforward:

match(v) | on(
  type::alt<0>() >> 1,
  type::alt<1>() >> 2,
  __ >> 0
);

The active alternative is already the dispatch key, so no residual work is needed.

The bucketed case appears when the alternative is still useful, but one key maps to multiple residual checks:

match(v) | on(
  type::is<int>()[_ > 100] >> 10,
  type::is<int>()          >> 1,
  type::is<std::string>()  >> 2,
  __                       >> 0
);

Here, the active variant alternative narrows the search immediately:

int -> replay only the int bucket
std::string -> replay only the string bucket

This preserves first-match semantics while avoiding a full-chain scan.

3. Matcher reuse for pipeline syntax¶

Lowering alone is not enough if the matcher object is rebuilt every time.

That is why v0.8.2 also adds a reusable pipeline form:

match(x) | PTN_ON(
  lit<1>() >> 1,
  lit<2>() >> 2,
  __ >> 0
);

PTN_ON(...) is a convenience wrapper around:

match(x) | static_on([] {
  return on(
    lit<1>() >> 1,
    lit<2>() >> 2,
    __ >> 0
  );
});

This keeps the pipeline syntax while moving matcher construction out of the hot path.

Worked Example: Why Raw `on(...)` Is Still Slow¶

Consider the three 128-way forms used by the benchmark:

Inline matcher¶

match(x) | on(
  lit<1>() >> 1,
  // ...
  lit<128>() >> 128,
  __ >> 0
);

This form pays for:

case object construction
value-handler object construction
on(...) tuple construction

on every call.

Manually cached matcher¶

static auto cases = on(
  lit<1>() >> 1,
  // ...
  lit<128>() >> 128,
  __ >> 0
);

return match(x) | cases;

This keeps the lowering path identical, but it removes repeated matcher construction.

Cached pipeline matcher¶

match(x) | PTN_ON(
  lit<1>() >> 1,
  // ...
  lit<128>() >> 128,
  __ >> 0
);

This is the same reuse idea packaged for pipeline syntax.

Final Results¶

The following numbers come from the current local run of:

.\build\bench\ptn_bench_lit.exe --benchmark_filter="LiteralMatch128" --benchmark_min_time=0.5s

Benchmark	Meaning	CPU time
`BM_PatterniaPipe_LiteralMatch128StaticCases`	prebuilt matcher object	`1.74 ns`
`BM_PatterniaPipe_LiteralMatch128On`	raw inline `on(...)`	`17.9 ns`
`BM_PatterniaPipe_LiteralMatch128OnMacro`	cached pipeline matcher via `PTN_ON(...)`	`2.09 ns`
`BM_Switch_LiteralMatch128`	hand-written `switch` baseline	`1.52 ns`

What these numbers mean¶

The static literal lowering itself is close to the switch baseline.
The dominant remaining gap in the raw pipeline form is matcher construction, not keyed dispatch.
PTN_ON(...) recovers most of the lost performance while keeping the pipeline syntax.

In practice, the current result is:

StaticCases is about 1.14x the switch baseline
PTN_ON(...) is about 1.37x the switch baseline
raw on(...) is roughly 11.8x slower than the switch baseline in this benchmark because it rebuilds the matcher every call

Outcome¶

v0.8.2 establishes the lowering engine as an internal architecture, not just a collection of local optimizations.

It delivers five concrete outcomes:

lit<value>() now has a real switch-oriented lowering path.
The engine has a shared legality model: full, bucketed, and none.
Variant dispatch and static literal dispatch now live under the same plan model.
PTN_ON(...) provides a practical pipeline form that avoids repeated matcher construction.
The performance gap to hand-written switch is now small enough that future work can focus on fewer, more specific remaining costs.

For API-level release notes, see v0.8.2.