Categorical Deep Learning

Categorical Deep Learning and Universal Numbers

August 10, 2025·SpeakEZ

A Confession and a Vision

A personal note from the founder of SpeakEZ Technolgies, Houston Haynes

I must admit something upfront: when I began design of the Fidelity framework in 2020, I was driven by practical engineering frustrations, particularly with AI development. The limitations of a managed runtime, the endless battle with numeric precision, machine learning framework quirks, constant bug chasing; these weren’t just inconveniences, they felt like fundamental architectural flaws. So I started building something different, guided more by engineering intuition than mathematical theory. Then I recently encountered the position paper “Categorical Deep Learning is an Algebraic Theory of All Architectures” by Gavranović et al., and experienced that rare moment of recognition: the mathematical foundations for what I have been building already existed.

Like many other of the recent advances and discoveries I’ve made, a significant credit is owed to Paul Snively, for his polyglot perspective that led to many of the connections made as formalism has taken a greater role in the framework. You can see more about him here Paul Snively on Programming Languages, Reliable Code and Good Taste in Software Engineering. While I may have eventually connected the dots on my own, Paul’s decades of lived experience with the practicalities of functional programming, and more specifically with formal verification has been a force multiplier for the speed at which my grasp of the domain has expanded over the past few months. A great deal of credit for this synthesis rests with him, while mistakes and omissions remain my own.

As for the reading, the authors of the CDL paper go through their process (as I currently understand it) of formalizing neural networks as morphisms in a 2-category. This provides a condensed theoretical underpinning I had been assembling piece by piece in the Fidelity framework. It was both humbling and exhilarating; humbling because I had been unknowingly fumbling in the dark where category theory sheds light, but it’s also exhilarating because it validated my framework’s architectural direction in a way that was completely unexpected.

This document represents my attempt to synthesize years of practical framework development with these theoretical underpinnings. It’s forward-looking and aspirational, and shouldn’t be taken as making light of the many technical hurdles still to overcome. I believe it points toward the convergence of classical systems with High-Performance Computing, Artificial Intelligence and Quantum as a single, mathematically unified paradigm.

This would, in effect, provide a coherent framework to explore them all from a single, hardware-aware software platform. I’m sharing a summation of what I’ve learned and the future directions I see.

The Journey So Far

Over the past several years, SpeakEZ has been designing components that form the foundation for a unified vision. Each piece on its own was solving a technical problem, and each was removing a source of computational inefficiency:

The exploration of matmul-free architectures demonstrated that the industry’s obsession with matrix multiplication was more historical accident than mathematical necessity. It showed how ternary quantization and sub-quadratic models could achieve comparable performance with dramatically lower computational requirements, often 10-100x more efficient.

And work on ternary models and heterogeneous computing speculated on how AMD’s unified memory architecture could enable new paradigms for distributed AI inference, with BAREWire providing the zero-copy substrate for efficient model orchestration, eliminating the memory bandwidth bottleneck that consumes up to 90% of cycle wait times in current systems.

The investigation into discriminated unions for post-transformer AI revealed how type-safe heterogeneous representations could better capture the diverse computational patterns emerging in modern architectures, reducing the “representation overhead” that forces complex patterns through inappropriate abstractions.

Insights into hypergraph architecture showed how preserving multi-way relationships enables data-flow computation that can be orders of magnitude more efficient than control-flow paradigms.

Our patent-pending proof-aware compilation design demonstrated that verification doesn’t add overhead; it removes it by enabling aggressive optimizations impossible without formal guarantees.

And the early vision of Fidelity as an AI Refinery established the framework’s role in transforming raw computational capabilities into efficient intelligent systems.

Each of these efforts was solving a specific problem, but looking back, they were all converging on the same observation:

The artificial separation among classical, HPC and AI is holding the industry back.

It also makes each technical domain less efficient than it could be.

The Current Crisis: Divergent Paths

Modern computing faces a long-standing schism. High-Performance Computing (HPC) and Artificial Intelligence (AI) have evolved along divergent paths, each developing its own tools, techniques, and staffing constituencies:

HPC’s Challenges

  • Verification Burden: Safety-critical simulations require formal proofs
  • Numerical Precision: IEEE-754 limitations cause accumulation errors
  • Scalability Walls: Traditional methods hit complexity barriers
  • Rigid Models: Physics equations can’t adapt to real-world complexity

AI’s Challenges

  • Black Box Problem: No proofs about model behavior
  • Numerical Instability: Gradient underflow, training irreproducibility
  • Semantic Loss: Meaning disappears in tensor operations

These aren’t separate problems; they’re symptoms of the same underlying issue: the lack of a unified mathematical foundation for computation.

The Solution: Categorical Deep Learning

As articulated in the paper by Gavranović et al., neural networks are not just computational graphs; they are morphisms in a 2-category. This insight provides the bridge between HPC’s rigorous mathematics and AI’s adaptive learning.

In mathematical terms, they show that a neural network is a morphism f:PLf: \mathcal{P} \to \mathcal{L} where P\mathcal{P} is the parameter space and L\mathcal{L} is the learner category. Backpropagation itself is the canonical 2-cell:

Para(P)forwardLbackwardPara(P)\text{Para}(\mathcal{P}) \xrightarrow{\text{forward}} \mathcal{L} \xrightarrow{\text{backward}} \text{Para}(\mathcal{P})

Where Para\text{Para} is the parameterized category construction that enables gradient flow. This is the structure our Fidelity framework is designed to express through the Clef language’s type system.

Why Clef Is a Natural Choice for This Domain

Clef is designed to express these higher-order mathematical structures directly. The functional foundation comes by way of F# and OCaml; the type-safety lineage runs through languages like Rust. Clef carries capabilities aimed specifically at categorical deep learning:

Computation Expressions: Native Categorical Structures

Clef’s computation expressions are a direct encoding of monadic and categorical patterns. Where other languages require extensive type encoding to express categorical operations, Clef expresses them directly:

// A 2-categorical morphism expressed naturally in Clef
type NeuralMorphism<'Input, 'Hidden, 'Output> =
    categorical {
        // Computation expressions model the 2-category structure
        let! layer1 = Morphism<'Input, 'Hidden>
        let! layer2 = Morphism<'Hidden, 'Output>

        // Horizontal composition (functor composition): (g ∘ f)
        let! forward = compose layer1 layer2

        // Vertical composition (natural transformations): α ∙ β
        let! transform = naturalTransform forward

        // The 2-cell (modification between natural transformations)
        return! modification transform
    }

This directly implements the mathematical structure from the CDL paper where neural networks form a 2-category Learn\mathbf{Learn} with:

  • Objects: Parameterized types (our Clef types with measures)
  • 1-morphisms: Learners (our typed functions)
  • 2-morphisms: Updates/reparameterizations (our gradient transformations)

This requires extensive encoding in OCaml, and the ownership model in Rust constrains the composition that category theory relies on.

Units of Measure: Dimensional Analysis for Free

Clef goes beyond OCaml with zero-cost units of measure that naturally express the dimensional analysis inherent in physical simulations and neural architectures:

// Dimensional correctness in neural architectures
[<Measure>] type neuron
[<Measure>] type layer
[<Measure>] type activation

type CategoricalLayer<[<Measure>] 'input, [<Measure>] 'output> = {
    Weights: Matrix<float<'output/neuron>, float<'input/neuron>>
    Transform: Morphism<'input, 'output>
    Adjoint: ContravariantFunctor<'output, 'input>
}

This dimensional typing keeps our categorical structures tied to physical and mathematical meaning at compile time, which neither OCaml nor Rust expresses as directly.

Type Providers: Bridging Abstract and Concrete

Clef’s type providers generate categorical structures from external schemas, connecting the mathematics directly to real-world data:

// Type provider generates categorical structure from neural architecture
type NeuralArchitecture = JsonProvider<"model.json">

let model = NeuralArchitecture.Load("transformer.json")

// Automatically derived categorical morphisms from architecture
let categoricalModel =
    model.Layers
    |> Seq.map (fun layer ->
        Morphism.fromStructure layer.Type layer.Parameters)
    |> Morphism.compose2Category

This capability extends to emerging standards like the Hypergraph Interchange Format (HIF), which provides a unified JSON schema for higher-order network data.

Our implementation of Clef’s type providers is designed to ingest type-safe representations from HIF-compliant datasets, integrating relational data from co-authorship networks, chemical reactions, or biological interactions directly into HPC simulations and AI training pipelines. This could ease the interchange of data and concepts across academic disciplines and industry verticals.

Active Patterns: Recognizing Categorical Structures

Clef’s active patterns let us recognize and destructure categorical patterns in ways that would require verbose visitor patterns in other languages:

// Recognize categorical patterns in neural networks
let (|Functor|Monad|Adjunction|) morphism =
    match morphism with
    | HasLeftAdjoint adj -> Adjunction(morphism, adj)
    | HasBindOperation bind -> Monad(morphism, bind)
    | _ -> Functor(morphism)

// Use pattern matching to optimize based on categorical structure
let optimize = function
    | Adjunction(f, g) ->
        // Exploit adjunction for perfect backpropagation
        optimizeAdjoint f g
    | Monad(m, bind) ->
        // Use monadic structure for sequential optimization
        optimizeMonadic m bind
    | Functor f ->
        // Standard functorial optimization
        optimizeFunctor f

Quotations: Preserving Mathematical Semantics

Clef’s quotations preserve the mathematical structure of expressions, enabling us to analyze and transform categorical operations at compile time:

// Quotations preserve categorical structure for analysis
let neuralOperation =
    <@ fun (input: Tensor<'n, 'd>) (weights: Morphism<'d, 'h>) ->
        categorical {
            let! forward = weights.Apply input
            let! backward = weights.Adjoint forward
            return (forward, backward)
        } @>

// Analyze the categorical structure at compile time
let structure = analyzeCategoricalStructure neuralOperation
// Generates optimized MLIR based on mathematical properties
let optimizedMLIR = compileToMLIR structure

Beyond Functional: The Engineering Bridge

What sets Clef apart is its pragmatic bridge to software engineering reality. We carefully and selectively extend that with specific design choices in the Fidelity framework:

  1. Shared Edges with .NET: We have gone to great lengths to preserve F# idioms in our framework. By extension this will offer many “shared edges” with .NET based solutions, allowing teams currently using F# for machine learning and HPC workloads a gradual transition path with manageable source modifications, including pathways for implementing classical compute with higher integrity and efficiency.

  2. Mutable Optimization: When needed, Clef allows controlled mutation which Fidelity framework and Composer compiler leverages for performance-critical sections without breaking the categorical abstraction. This hybrid approach, detailed in our reactive framework design (now absorbed into CCS), presents developers with pure, immutable interfaces while allowing the compiler to selectively introduce mutation based on scope analysis in the computation graph. This “immutability at design time, verified mutation at runtime” strategy means the categorical abstractions remain pure for reasoning and composition, while achieving the same performance as hand-optimized imperative code. The compiler’s scope analysis ensures mutations only occur when mathematically equivalent to the pure version, preserving all categorical properties while eliminating allocation overhead.

  3. True Concurrency & Parallelism: Clef’s async expressions naturally model the parallel structure of categorical compositions, but as we explored in The Full Frosty Experience, this goes far beyond traditional managed runtime implementations. Through delimited continuations, Frosty transforms async computations into explicit categorical morphisms that can be verified, traced, and compiled to platform-native code without runtime overhead. The delimited continuations make the “rest of the computation” a first-class value that can be inspected, transformed, and verified, turning what was once managed runtime magic into compile-time certainty.

  4. Interop: The FFI in Rust is verbose and the ecosystem in OCaml is narrower, where Clef provides direct interop with C/C++ libraries, extended through our Farscape CLI tool. As detailed in Farscape’s Modular Entry Points, this goes beyond simple bindings; we can generate drop-in replacements for tools like OpenSSL that maintain API compatibility while adding type safety. The established ecosystem of HPC libraries, from PETSc for scientific computing to FFTW for signal processing, becomes available with Clef’s type safety. The categorical structures we’re implementing integrate with numerical libraries that have been optimized over decades. We have found no other functional language that pairs this combination in the standing literature we have reviewed: expressing 2-categorical morphisms while calling directly into established HPC kernels, with compile-time type safety and no additional overhead.

Clef is a strong fit for implementing categorical deep learning: it combines the expressiveness to represent 2-categories with the engineering capabilities to deploy them at scale. OCaml carries the theory with a narrower ecosystem. Rust carries the performance with friction against the abstractions. Haskell carries the categories with a harder interop story. Through close alignment to F# idioms and our Composer compiler, the Fidelity framework is designed to bridge these worlds.

Quantum Computing: The Natural Beneficiary

As we explored in our quantum optionality analysis, this 2-categorical foundation is not limited to classical computation. Quantum computing emerges as a beneficiary of the same mathematical framework, and the alignment follows from the mathematics rather than from forcing a fit.

Quantum computations are categorical. Quantum circuits are morphisms, quantum gates are natural transformations, and quantum mechanics is expressed in the language of monoidal categories. The same 2-categorical structure that unifies HPC and AI extends to quantum:

// Quantum operations ARE 2-categorical morphisms
type QuantumMorphism<'Input, 'Output> =
    | Unitary of U: UnitaryOperator<'Input, 'Output> * U: Adjoint<'Output, 'Input>
    | Measurement of Projector<'Input, Classical<'Output>>

// The SAME categorical structure works for quantum
let quantumCategorical = categorical {
    // Prepare quantum state (functor)
    let! prepared = QuantumFunctor.prepare classicalData

    // Apply quantum circuit (morphism composition)
    let! evolved = QuantumMorphism.compose [
        Hadamard
        CNOT
        PhaseShift(π/4)
    ]

    // The adjoint is built-in (2-category structure)
    let! adjoint = evolved.Adjoint

    // Measurement (natural transformation to classical)
    return! Measurement.collapse evolved
}

This is not shoe-horning quantum into our framework; the mathematical foundations of quantum mechanics are categorical. Unitarity describes morphisms that compose with their adjoints to give identity. Entanglement describes the monoidal structure of tensor products. The language of quantum mechanics is the language of category theory.

Our categorical foundation for unifying HPC and AI is designed to encompass quantum without modification. The same Clef computation expressions that model neural network training are meant to model quantum circuit evolution. The same proof systems that verify conservation laws are meant to verify quantum unitarity. The same Universal numbers that handle HPC precision are meant to handle quantum amplitudes. Microsoft Research’s own work to create the Q# language from F# points to that alignment.

While others are building separate classical and quantum stacks, hoping to integrate them later, our categorical approach is designed as a single framework that covers all four paradigms.

Organizations using Fidelity platform won’t need to adopt new abstractions or rewrite their systems for quantum-classical hybrid workloads.

In our design, the Fidelity framework would offer that degree of freedom by adding another backend to the same software semantics.

Implementing the Core Insight

With those Clef capabilities in place, we can express the unified view directly:

// The unified view: All computation as categorical morphisms
type UnifiedComputation<'Input, 'Output> = {
    // Forward computation (HPC simulation OR AI inference)
    Forward: Functor<'Input, 'Output>

    // Backward computation (Adjoint methods OR backpropagation)
    Backward: ContravariantFunctor<'Output, 'Input>

    // the duality: Forward ⊣ Backward
    Adjunction: AdjointPair<'Input, 'Output>

    // Preserved invariants (Conservation laws OR learned constraints)
    Invariants: Set<CategoricalProperty>
}

and AdjointPair<'Input, 'Output> = {
    // unit of the adjunction: η: Id → G∘F
    Unit: 'Input -> 'Input

    // counit of the adjunction: ε: F∘G → Id
    Counit: 'Output -> 'Output

    // triangle identities, verifier-discharged
    Certificate: Z3Certificate
}

// Clef supports custom operators for the mathematical notation
let inline () forward backward =
    { Unit = fun x -> backward.Apply(forward.Apply x)
      Counit = fun y -> forward.Apply(backward.Apply y)
      Certificate = checkAdjunction forward backward }  // Z3 discharges the triangle identities

let computation = {
    Forward = myForwardFunctor
    Backward = myBackwardFunctor
    Adjunction = myForwardFunctor  myBackwardFunctor  // Optional operator syntax
    Invariants = Set.ofList [EnergyConservation; MomentumConservation]
}

This implements the fundamental theorem from CDL: every differentiable function f:ABf: A \to B gives rise to an adjunction:

FwdfBwdf:Para(A)Para(B)\text{Fwd}_f \dashv \text{Bwd}_f : \text{Para}(A) \rightleftarrows \text{Para}(B)

Where the forward pass Fwdf\text{Fwd}_f and backward pass Bwdf\text{Bwd}_f form an adjoint pair. In HPC, this manifests as the adjoint method for sensitivity analysis. In AI, it’s backpropagation. In quantum computing, it’s the unitary conjugate.

The mathematics are identical. Only the substrate differs.

This is a blueprint for unification that Clef is designed to express through its computation expressions, type providers, and quotation system. As we explored in our Beyond Transformers work, the shift away from matrix multiplication opens the door to other representations. Category theory provides that representation, and Clef provides the engineering vehicle.

Key CDL Principles Applied to HPC+AI

The CDL paper establishes four principles that our Fidelity framework is designed to implement:

  1. Compositional Structure: Both simulations and neural networks compose functorially

    f:AB,g:BCgf:ACf: A \to B, \quad g: B \to C \quad \Rightarrow \quad g \circ f: A \to C
  2. Duality: Every forward computation has a dual (adjoints in HPC, gradients in AI)

    Forward:CDBackward:DC\text{Forward}: \mathcal{C} \to \mathcal{D} \quad \dashv \quad \text{Backward}: \mathcal{D} \to \mathcal{C}
  3. Algebraic Properties: Conservation laws and weight tying emerge from the same structures

    Invariant(fg)=Invariant(f)Invariant(g)\text{Invariant}(f \circ g) = \text{Invariant}(f) \wedge \text{Invariant}(g)
  4. Equivariance: Symmetries in physics and neural architectures share mathematical roots

    ρ(g)f(x)=f(ρ(g)x)for all gG\rho(g) \cdot f(x) = f(\rho(g) \cdot x) \quad \text{for all } g \in G

These aren’t separate implementations in Fidelity; they’re different views of the same categorical structure encoded in our Clef type system.

Universal Numbers: Solving the Numerical Problem

The Universal Numbers Library provides the numerical foundation that makes both HPC and AI practical at scale. This addresses one of the core challenges we identified in our ternary models exploration: maintaining numerical fidelity across heterogeneous computing environments.

Posit Arithmetic: The Best of Both Worlds

While Fidelity is rooted in Clef, the Universal Numbers library exists as optimized C++ code that we integrate through our compilation pipeline. MLIR itself is implemented in C++, and when our Clef code lowers through the compilation stack, it interfaces with these numerical primitives:

// C++ Universal posit - perfect for both domains
template<unsigned nbits, unsigned es>
class posit {
    // Tapered accuracy: More precision near 1.0 (AI's operating point)
    // Wide dynamic range: Handles HPC's extreme scales
    // Exact accumulation: Via quire for reproducibility
};

Think of this as the numerical “engine” that drives our type-safe Clef abstractions. Just as you don’t need to understand the assembly instructions your CPU executes, you interact with posits through Clef’s type system while the C++ implementation handles the bit-level arithmetic. The MLIR lowering strategy is designed so that the C++ posit operations become direct hardware instructions rather than library calls.

Clef Type-Safe Integration

Building on our work with discriminated unions, we can create type-safe numerical representations that respect the categorical structure:

// Type-safe wrapper preserves semantics
type Posit<[<Literal>] nbits: int, [<Literal>] es: int> =
    private | PositValue of uint64

    // For HPC: Preserve conservation laws
    static member ConservationSum (values: Posit<'n,'e> array) =
        use quire = Quire<'n, 512>.Zero
        for v in values do quire.Add(v)  // Exact!
        quire.ToPosit()  // Single rounding

    // For AI: Stable gradients
    static member StableGradient (loss: Posit<32,2>) =
        // Tapered accuracy prevents underflow
        Gradient.compute loss

This connects to the CDL paper’s requirement for a symmetric monoidal category with biproducts. The Universal numbers provide the numerical semiring (R,+,×)(\mathbb{R}, +, \times) with the crucial property that our morphisms preserve the algebraic structure:

HomPosit(AB,C)HomPosit(A,C)×HomPosit(B,C)\text{Hom}_{\text{Posit}}(A \oplus B, C) \cong \text{Hom}_{\text{Posit}}(A, C) \times \text{Hom}_{\text{Posit}}(B, C)

This biproduct structure is essential for gradient decomposition and is automatically preserved by our type-safe implementation.

Formal Verification: Provable Numerics

The Missing Link

F* and Z3 provide the formal verification layer for the unified framework. The point often missed is that proofs don’t just ensure correctness; they inform optimization patterns that can be up to 100x more efficient. This extends the work we outlined in Transforming AI Efficiency where we show that proofs are also lowered in MLIR to execute through SMTLIB.

Proof-Carrying Code in the Hypergraph

This abstract should be considered an advanced example, something that would be opt-in, going above and beyond the MISRA-C-class proofs that would ride along with most classical Clef code in this framework. But it’s an example of how extensible the framework can be when the use case calls for this level of specialization.

// Hypergraph edges carry both algorithmic and numerical proofs
type ProofHyperedge =
    // HPC proofs
    | ConservationProof of system: Node * law: ConservationLaw * cert: Z3Certificate
    | StabilityProof of solver: Node * condition: StabilityCondition * cert: SMTProof

    // AI proofs
    | ConvergenceProof of training: Node * bound: ConvergenceBound * cert: SMTProof
    | RobustnessProof of model: Node * perturbation: Epsilon * cert: Z3Certificate

    // Unified proofs
    | NumericalExactnessProof of computation: Node * error: ErrorBound * cert: Universal
    | CompositionCorrectnessProof of components: Node list * property: Property * cert: SMTProof

Beyond Moore’s Law: The Data-Flow Advantage

Traditional Von Neumann and Modified Harvard architectures force a control-flow paradigm where computation and memory are separated, creating endless cycles of fetch-decode-execute with associated wait states and heat dissipation. As we explored in our hypergraph architecture, data-flow representations can be significantly more efficient because computation happens where the data lives.

This isn’t theoretical. I’ve witnessed the inverse cost of this firsthand in digital audio engineering, where 16-bit representation for Compact Disc authoring required heroic efforts to “dither” floating point truncation. The computational gymnastics needed to mask quantization noise consumed more engineering time than the actual audio post-production flow. We’d spend 90% of our cycles compensating for representation limitations. The parallel with current AI/HPC is stark: we waste enormous computational resources managing the impedance mismatch between our mathematical intentions and our computational substrates.

Proofs as Optimization Catalysts

As detailed in our proof-aware compilation work, mathematical proofs don’t just verify correctness; they reveal optimization opportunities invisible to traditional compilers:

// Traditional approach: defensive programming with runtime checks
let traditional_matrix_multiply A B =
    // Runtime dimension checks
    assert (A.Cols = B.Rows)
    // Bounds checking on every access
    for i in 0..A.Rows-1 do
        for j in 0..B.Cols-1 do
            for k in 0..A.Cols-1 do
                // Each access checks bounds
                result.[i,j] <- result.[i,j] + A.[i,k] * B.[k,j]

// Categorical approach with proofs
let categorical_matrix_multiply
    (A: Matrix<'n, 'm>)
    (B: Matrix<'m, 'p>)
    : Matrix<'n, 'p> =
    // dimensions checked at compile time; proofs remove the runtime bounds checks
    categorical {
        return! matmul A B
    }

The proofs give the compiler permission to optimize aggressively. This connects to the CDL paper’s observation that gradient flow is a natural transformation η:IdT\eta: \text{Id} \Rightarrow T where TT is the gradient operator. When we prove properties about η\eta, we’re proving properties about the available optimizations:

Optimize(f)=giffηf=ηg and Invariants(f)=Invariants(g)\text{Optimize}(f) = g \quad \text{iff} \quad \eta_f = \eta_g \text{ and } \text{Invariants}(f) = \text{Invariants}(g)

Safety and performance line up here as aspects of the same mathematical correctness. When the framework aligns with the structure of the computation, the engineering burden of the defensive patterns drops away. Those patterns turn out to be the cost of working against the underlying mathematics rather than with it.

Calendar Time: The Hidden Multiplier

The efficiency gains won’t just be found during training and inference; they will also be multiplied by shifts in development time. Current AI/HPC development follows assumptions around costly patterns:

  1. Write initial code (teams, for weeks)
  2. Debug tensor shape errors (teams, for weeks)
  3. Track down numerical instabilities (separate team, for weeks)
  4. Optimize bottlenecks (separate team, for weeks)
  5. Re-verify after optimization (days)
  6. Deploy and remediate edge cases (ongoing)

With Fidelity’s proposed approach:

  1. Write type-safe categorical code (one team, for weeks)
  2. Compiler catches all shape/dimension errors ( – )
  3. Universal numbers prevent instabilities ( – )
  4. Proofs guide optimizations automatically ( – )
  5. Verification is built-in ( – )
  6. Edge cases caught at compile time ( – )

The same functionality would ship in weeks instead of months. Smaller, more tightly aligned teams could iterate faster, with more room to adapt the code and the data it’s trained on, exploring solution spaces that were previously infeasible to reach.

The Compounding Effect

These gains compound across their domains rather than adding. A system that’s 10x more efficient at runtime, developed 6x faster, with 10x fewer bugs, running on hardware that dissipates 10x less heat, multiplies its advantage at each layer. Modern AI/HPC systems carry a great deal of “computational dithering,” cycles spent managing accidental complexity rather than solving the problem at hand:

  • Memory management overhead: Copying data between CPU/GPU
  • Synchronization waste: Barriers and locks for incorrect abstractions
  • Precision management: Converting between float16/32/64
  • Framework translation: PyTorch -> ONNX -> TensorRT -> Hardware

Each layer adds overhead, heat, and opportunity for error, all while leadership is watching sand drop through the hourglass. The categorical approach with Universal numbers eliminates these layers:

// Direct path from mathematics to hardware
let QCE molecule =
    // Mathematics expressed directly
    let hamiltonian = constructHamiltonian molecule

    // Universal numbers handle precision automatically
    let groundState = Posit<32,2>.DiagonalizeExactly hamiltonian

    // Category theory ensures correct composition
    let correlation = categorical {
        let! hartreeFock = HF.compute molecule
        let! correction = Neural.correlationEnergy molecule
        return hartreeFock + correction
    }

    // lowers straight to hardware through MLIR, no intermediate copies or conversions
    Fidelity.compile correlation |> Execute.on GPU

And rather than waiting on a new silicon process node, these gains are within reach on today’s hardware through mathematical and architectural work.

Everything Fidelity framework establishes for the next generation of hardware can also target the current generation of hardware with some in-compiler adjustments to target appropriate hardware.

Our goal is to make the design-time experience substantially the same in either case.

Fidelity Framework: Unifying Implementation

The Fidelity framework is designed to carry these concepts into working engineering, building on all our previous work:

Coeffect Analysis for Unified Optimization

As we explored in our ternary models work, different computational patterns require different execution strategies:

// Coeffects determine optimal execution strategy
type UnifiedCoeffects =
    | SimulationPattern of timesteps: int * conservation: Laws
    | LearningPattern of epochs: int * gradients: Flow
    | HybridPattern of physics: Simulation * correction: Learning

    member this.CompilationStrategy =
        match this with
        | SimulationPattern(_, laws) when laws.AreLinear ->
            InteractionNets  // Parallel physical simulation
        | LearningPattern(_, flow) when flow.IsSparse ->
            DelimitedContinuations  // Sequential gradient flow
        | HybridPattern(physics, learning) ->
            HeterogeneousExecution(physics.OnHPC(), learning.OnGPU())

Proof-Aware Compilation Through Hypergraphs

The hypergraph architecture is designed to optimize while preserving proofs:

Layer 1: Hypergraph Optimization with Proofs

// Optimize while maintaining verified properties
let optimizeWithProofs (graph: Hypergraph) =
    // Extract proof obligations
    let proofs = graph.Hyperedges |> filterProofs

    // For HPC: Maintain conservation laws
    let physicsProofs = proofs |> filterPhysics
    let optimizedPhysics =
        graph
        |> fuseOperations physicsProofs
        |> parallelizeTimeSteps physicsProofs

    // For AI: maintain convergence bounds
    let learningProofs = proofs |> filterLearning
    let optimizedLearning =
        graph
        |> fuseGradients learningProofs
        |> quantizeWeights learningProofs

    // Unified: Both optimized with proofs preserved
    combineOptimized optimizedPhysics optimizedLearning proofs

Layer 2: MLIR with Constraint Preservation

  graph TD
    A[Clef Source with Proofs] --> B[Alex AST Transform]
    B --> C[High-Level MLIR]
    C --> D[Proof-Preserving Lowering]
    D --> E[Hardware-Specific MLIR]
    E --> F[Verified Machine Code]

    G[Z3 Verification] --> D
    H[SMT Proofs] --> D
    I[Universal Numerics] --> D

At the MLIR level, proof obligations once satisfied transform into optimization constraints using standard MLIR infrastructure. Rather than custom dialects, we leverage MLIR’s existing attribute system and transformation framework, including the SMT dialect for encoding verification conditions that can be checked by Z3 during lowering. Proof metadata travels as function and operation attributes that standard MLIR passes respect but don’t need to understand.

For example, when lowering HPC simulations, we use standard linalg and affine dialects for the computation, with satisfied proof obligations encoded as attributes that prevent unsafe transformations. The affine dialect’s polyhedral model naturally preserves loop invariants that correspond to conservation laws. The SMT dialect encodes these invariants as assertions that can be verified at compile time. Similarly, AI operations lower through tensor and linalg dialects with attributes marking gradient-critical paths that must maintain numerical stability.

MLIR’s pass infrastructure already supports preserving unknown attributes through transformations. These attributes flow all the way through to LLVM as metadata and function attributes that constrain backend optimizations. For instance, a conservation law verified by Z3 becomes both an affine constraint in MLIR and a llvm.loop.invariant metadata node in LLVM IR. A convergence bound becomes both a barrier to certain MLIR transformations and an llvm.assume intrinsic that enables safe optimizations while preventing unsafe ones. The mathematical properties guide the lowering without requiring MLIR or LLVM to understand the proofs themselves. They respect the constraints that our PHG-guided proofs impose, based on their satisfaction through MLIR and F*.

Layer 3: Hardware-Specific Verified Code

// Generate verified code for specific hardware
let generateVerifiedCode (target: HardwareTarget) (graph: OptimizedGraph) =
    match target with
    | CPU ->
        // HPC kernels with vector intrinsics
        generateCPUWithProofs graph "AVX-512" "OpenMP"
    | GPU ->
        // AI kernels with tensor cores
        generateGPUWithProofs graph "CUDA" "TensorRT"
    | FPGA ->
        // Custom datapaths for both
        generateFPGAWithProofs graph "Verilog" "HLS"
    | Heterogeneous(cpu, gpu) ->
        // Split verified computation
        let hpcPart = extractHPC graph |> generateCPU
        let aiPart = extractAI graph |> generateGPU
        combineWithProofs hpcPart aiPart

The HPC-AI Convergence

Why Convergence Follows

HPC and AI are discovering they need each other, and the CDL mathematics shows why. Both domains are working with the same underlying structure:

HPC:PhysFCompFPhys\mathbf{HPC}: \text{Phys} \xrightarrow{F} \text{Comp} \xrightarrow{F^*} \text{Phys}

AI:DataNLatentNData\mathbf{AI}: \text{Data} \xrightarrow{N} \text{Latent} \xrightarrow{N^*} \text{Data}

Where FF^* and NN^* are the adjoints (sensitivity analysis and backpropagation respectively). The convergence occurs when we recognize these are the same pattern:

Unified:AΦBΦA\mathbf{Unified}: \mathcal{A} \xrightarrow{\Phi} \mathcal{B} \xrightarrow{\Phi^{\dagger}} \mathcal{A}

New Computational Primitives

The convergence creates new primitives that transcend the HPC/AI divide:

// Differentiable simulation - gradients through physics
type DifferentiableSimulation<'State> = {
    // Forward: HPC simulation
    Simulate: 'State -> 'State

    // Backward: Automatic differentiation
    Gradient: 'State -> Gradient<'State>

    // Certificates: Z3 discharges energy conservation and gradient correctness
    ForwardCertificate: Z3Certificate
    BackwardCertificate: Z3Certificate

    // Numerics: Unified representation
    Arithmetic: Posit<32,2>
}

// Verified neural operators - learning with proofs
type VerifiedNeuralOperator<'Domain> = {
    // Learn: AI optimization
    Train: Dataset<'Domain> -> Model<'Domain>

    // Verify: Z3 discharges the safety property and returns a certificate
    Certify: Model<'Domain> -> Z3Certificate

    // Execute: HPC performance
    Run: 'Domain -> 'Domain

    // Numerics: Same substrate
    Computation: Universal.Quire<32, 1024>
}

Unified Applications

Digital Twins with Verified Learning

Building on our exploration of heterogeneous computing, we can create unified digital twins:

module JetEngineDigitalTwin =
    // Clef implementation with SMT verification attributes
    [<SMT Requires("valid_state(engine)")>]
    [<SMT Ensures("energy_conserved(result)")>]
    [<SMT Ensures("wear_monotonic(result)")>]
    [<SMT Ensures("safe_operating_envelope(result)")>]
    let implementation (engine: EngineState) (sensorData: SensorStream) =
        // HPC: Computational fluid dynamics
        let cfd = VerifiedCFD<Posit<64,3>>(
            proof = ConservationProof.Energy
        )

        // AI: Learn degradation patterns
        let degradation = LearnedDegradation<Posit<32,2>>(
            proof = MonotonicityProof.Wear
        )

        // Unified: Compose with verification
        let unified = categorical {
            let! flow = cfd.Simulate
            let! wear = degradation.Predict

            // Z3 verifies composition preserves properties
            let! composed = verifyComposition flow wear

            return composed
        }

        // Compile to efficient implementation
        unified
        |> Fidelity.BuildHypergraph
        |> Fidelity.AttachProofs [cfd.Proof; degradation.Proof]
        |> Fidelity.OptimizeWithProofs
        |> Universal.GenerateCode

Climate Modeling with Physics-Informed Learning

module ClimateModel =
    // Verification specification
    type ClimateInvariants = {
        EnergyBalance: bool  // RadiationIn = RadiationOut + Storage
        MassConservation: bool  // TotalMass = Constant
        ThermodynamicLaws: bool  // EntropyNonDecreasing
    }

    // Hybrid implementation with SMT verification
    [<SMT Requires("valid_initial_state(atmosphere)")>]
    [<SMT Ensures("energy_balance(result)")>]
    [<SMT Ensures("mass_conserved(result)")>]
    [<SMT Ensures("entropy_nondecreasing(result)")>]
    let climatePredictor (initialState: ClimateState) =
        // HPC: Atmospheric dynamics
        [<SMT Invariant("conserves_energy(atmospheric)")>]
        [<SMT Invariant("conserves_angular_momentum(atmospheric)")>]
        let atmospheric = NavierStokesOnSphere<Posit<32,2>>()

        // AI: Sub-grid phenomena
        [<SMT Ensures("preserves_mass_balance(subgrid)")>]
        [<SMT Ensures("bounded_output(subgrid, -100.0, 100.0)")>]
        let subgrid = NeuralParameterization<Posit<16,1>>()

        // Verification through composition
        [<SMT Assert("no_overflow(atmospheric.Compute)")>]
        [<SMT Assert("no_underflow(subgrid.Compute)")>]
        [<SMT Assert("composition_preserves_invariants(atmospheric, subgrid)")>]
        let verifiedComposition =
            {
                Atmospheric = atmospheric
                SubGrid = subgrid
                Invariants = {
                    EnergyBalance = true
                    MassConservation = true
                    ThermodynamicLaws = true
                }
            }

        // Execute with verified invariants
        VerifiedExecution(verifiedComposition)

Autonomous Systems with Certified Safety

As we explored in our post-transformer architectures, safety-critical systems benefit from unified verification:

module AutonomousVehicle =
    // Clef implementation with SMT verification annotations
    [<SMT Requires("perception.Accuracy > 0.99 && perception.Latency < 10ms")>]
    [<SMT Ensures("forall obstacle in scene.Obstacles.
                  distance(cmd.Trajectory, obstacle) > safety_margin")>]
    [<SMT Ensures("cmd.Velocity <= speed_limit && cmd.Acceleration <= comfort_threshold")>]
    let safe_trajectory (perception: SceneUnderstanding)
                       (planning: TrajectoryOptimization)
                       : VehicleCommand =

        // AI: perception with verified detection bounds
        let verifiedPerception =
            [<SMT Invariant("forall obj. obj.IsObstacle => detected(obj)")>]
            let transformer = VerifiedVisionTransformer<Posit<16,1>>()
            transformer.Configure({
                MinDetectionConfidence = 0.99
                MaxLatency = 10<ms>
            })
            transformer

        // HPC: Real-time trajectory optimization
        [<SMT Ensures("trajectory.IsDynamicallyFeasible()")>]
        [<SMT Ensures("forall t. trajectory.MinDistance(t) > margin")>]
        let optimizeTrajectory (scene: Scene) =
            let controller = OptimalControl<Posit<32,2>>()
            controller.SetConstraints({
                VehicleDynamics = getVehicleModel()
                SafetyMargin = 2.0<m>
                ComfortLimits = getComfortProfile()
            })
            controller.Optimize(scene)

        // Unified: Compose with safety proof
        let integrated = categorical {
            // Process sensor data
            let! scene = verifiedPerception.Process(perception.SensorData)

            // Plan trajectory with verification
            let! path = optimizeTrajectory scene

            // End-to-end safety verification
            [<SMT Assert("collision_free(path) && traffic_compliant(path)")>]
            let! verifiedPath = validatePath scene path

            return VehicleCommand(verifiedPath, ProofCertificate = true)
        }

        integrated |> Async.RunSynchronously

Advanced Implementation Examples

The following examples showcase the depth of verification possible within the Fidelity framework, though we emphasize these are forward-looking implementations that will evolve as we refine the integration between Clef, F*, and our compilation pipeline. Importantly:

The extensive proof annotations shown here represent the maximum verification depth available, not the minimum required.

The Fidelity framework embraces a “verification by choice” philosophy: most Clef developers will write standard Clef code with optional type safety features, adding verification attributes only where their domain demands it. A web application might use no proofs at all, a financial system might verify key invariants, while safety-critical aerospace systems could leverage the full depth shown below. This graduated approach lets teams adopt formal methods incrementally, starting with type safety and adding verification where the business value justifies the effort. The same framework spans casual scripting through stringent certification requirements.

Verified Fluid-Structure Interaction

module FluidStructureInteraction =
    // The problem: Coupling fluid dynamics with structural mechanics
    // Traditional: Separate solvers with weak coupling
    // Unified: Single verified computation

    // Define numerical representation
    type FluidNumber = Posit<32, 2>
    type StructureNumber = Posit<64, 3>  // Higher precision for structure

    // Clef implementation with SMT verification annotations
    [<SMT Requires("compatible_interface(fluid, structure)")>]
    [<SMT Ensures("momentum_conserved(result.Fluid, result.Structure)")>]
    [<SMT Ensures("energy_conserved(result.Fluid, result.Structure)")>]
    [<SMT Ensures("stable_coupling(result.Fluid, result.Structure)")>]
    let coupled_system (fluid: FluidState) (structure: StructuralState) =

        // HPC: Fluid solver with conservation verification
        [<SMT Invariant("conserves_momentum(state)")>]
        [<SMT Invariant("conserves_mass(state)")>]
        let solveFluid (initialState: FluidState) = categorical {
            // Use quire for exact pressure accumulation
            use pressureAccumulator = Quire<32, 1024>.Zero

            // Solve Navier-Stokes with verified conservation
            let! solution = solveNavierStokes<FluidNumber> initialState

            // Assert conservation properties
            [<SMT Assert("momentum(solution) = momentum(initialState)")>]
            let! verified = verifyConservation solution

            return verified
        }

        // HPC: Structure solver with AI-enhanced fatigue prediction
        [<SMT Ensures("stress.MaxValue < yield_strength")>]
        [<SMT Ensures("fatigue.Cycles > design_life")>]
        let solveStructure (load: StructuralLoad) = categorical {
            // High-precision stress computation
            let! stress = computeStress<StructureNumber> load

            // AI: Learn fatigue from data with bounded prediction
            [<SMT Ensures("fatigue.Confidence > 0.95")>]
            let! fatigue = neuralFatigueModel.Predict stress

            // Verify stress-fatigue relationship
            [<SMT Assert("fatigue_valid(stress, fatigue)")>]
            let! verified = validateFatigue stress fatigue

            return (stress, fatigue)
        }

        // Coupled system with interface verification
        [<SMT Invariant("interface_continuity(fluid, structure)")>]
        let coupled = categorical {
            // Solve fluid domain
            let! fluidSolution = solveFluid fluid

            // Transfer loads at interface
            [<SMT Assert("load_balance(fluidSolution.Pressure, structureLoad)")>]
            let structureLoad = extractInterfaceLoad fluidSolution

            // Solve structure with transferred loads
            let! (stress, fatigue) = solveStructure structureLoad

            // Update fluid boundary from structural deformation
            [<SMT Assert("geometric_compatibility(fluidSolution, deformation)")>]
            let deformation = computeDeformation stress
            let! updatedFluid = updateFluidBoundary fluidSolution deformation

            // Verify coupling maintains all properties
            [<SMT Assert("energy(updatedFluid) + energy(stress) = energy(fluid) + energy(structure)")>]
            let! verifiedResult = {
                Fluid = updatedFluid
                Structure = { Stress = stress; Fatigue = fatigue }
                InterfaceForces = structureLoad
                CouplingStable = true
            }

            return verifiedResult
        }

        // Execute with automatic proof generation
        coupled |> Categorical.RunWithProofs

Quantum Chemistry with Neural Corrections

This shows how quantum chemical calculations integrate SMT verification as attributes, ensuring physical constraints (Hermiticity, normalization, variational principle) are maintained throughout the computation while combining HPC (Hartree-Fock) with AI (neural correlation) methods.

module QuantumChemistry =
    // The challenge: Ab initio calculations are exponentially expensive
    // Solution: Verified neural corrections to approximate methods

    // Quantum invariants as type constraints
    type QuantumInvariants = {
        Hermiticity: bool  // H = H†
        Normalization: bool  // ⟨ψ|ψ⟩ = 1
        Variational: bool  // E_approx ≥ E_exact
    }

    // Clef implementation with SMT verification
    [<SMT Requires("molecule.IsValid && molecule.Electrons > 0")>]
    [<SMT Ensures("result.Energy >= exact_ground_state_energy(molecule)")>]
    [<SMT Ensures("abs(result.Energy - exact_energy) < 1e-6<Hartree>")>]
    [<SMT Ensures("result.Wavefunction.IsNormalized()")>]
    let molecular_energy (molecule: Molecule) =

        // HPC: Hartree-Fock baseline with variational guarantee
        [<SMT Ensures("energy >= exact_ground_state")>]
        [<SMT Invariant("hamiltonian.IsHermitian()")>]
        let computeHartreeFock (mol: Molecule) =
            let hf = HartreeFock<Posit<64,4>>()
            hf.Configure({
                BasisSet = "cc-pVTZ"
                ConvergenceThreshold = 1e-10<Hartree>
                MaxIterations = 100
            })

            // Self-consistent field iteration
            [<SMT Invariant("density_matrix.IsIdempotent()")>]
            let energy = hf.SolveSCF(mol)

            // Verify variational principle
            [<SMT Assert("energy >= exact_ground_state_energy(mol)")>]
            { Energy = energy; Orbitals = hf.Orbitals }

        // AI: Neural correction with bounded error
        [<SMT Ensures("abs(correction) < max_correlation_energy")>]
        [<SMT Ensures("sign(correction) = -1")>]  // Correlation always lowers energy
        let computeCorrelation (hfResult: HartreeFockResult) =
            let nn = NeuralCorrelation<Posit<32,2>>()
            nn.Configure({
                Architecture = "EquivariantGNN"  // Respects molecular symmetries
                TrainedOn = "CCSD(T)_G4_dataset"
                ErrorBound = 1.0<kcal/mol>
            })

            // Predict with uncertainty quantification
            [<SMT Invariant("nn.PreservesSymmetry(molecule.PointGroup)")>]
            let (correction, uncertainty) = nn.PredictWithUncertainty(hfResult)

            // Verify correction is physically reasonable
            [<SMT Assert("correction <= 0.0<Hartree>")>]  // Always negative
            [<SMT Assert("abs(correction) < 0.5 * hfResult.Energy")>]  // Bounded

            { Correction = correction; Uncertainty = uncertainty }

        // Combine with verified accuracy bounds
        [<SMT Ensures("result.TotalEnergy = hf.Energy + correlation.Correction")>]
        [<SMT Ensures("result.ErrorBound < 1.0<kcal/mol>")>]
        let combineWithProof (hf: HartreeFockResult) (corr: CorrelationResult) =
            // Total energy with SMT-discharged error bound
            let totalEnergy = hf.Energy + corr.Correction

            // Error propagation
            [<SMT Assert("total_error = sqrt(hf_error^2 + correlation_error^2)")>]
            let errorBound =
                sqrt(hf.ConvergenceError ** 2.0 + corr.Uncertainty ** 2.0)

            // Package with quantum invariants verified
            {
                TotalEnergy = totalEnergy
                HartreeFock = hf
                Correlation = corr
                ErrorBound = errorBound
                QuantumInvariants = {
                    Hermiticity = true  // Guaranteed by HF
                    Normalization = true  // Maintained throughout
                    Variational = true  // HF+correction ≥ exact
                }
            }

        // Execute computation with all verifications
        let hfResult = computeHartreeFock molecule
        let correlation = computeCorrelation hfResult
        let verified = combineWithProof hfResult correlation

        verified

Verified Kalman Filter with Learning

Building on our discriminated unions exploration, we can create verified filters with learned components:

// Classic algorithm enhanced with verified learning
module VerifiedKalmanFilter =
    open Universal.Posit

    // Clef implementation with SMT verification attributes
    [<SMT Requires("positive_definite(state.Covariance)")>]
    [<SMT Ensures("positive_definite(result.Covariance)")>]
    [<SMT Ensures("result.Error <= state.Error + measurement.Noise")>]
    [<SMT Ensures("numerically_stable(result)")>]
    let kalman_update (state: KalmanState<Posit<32,2>>)
                     (measurement: Measurement<Posit<16,1>>)
                     : KalmanState<Posit<32,2>> =

        // Predict step with covariance propagation
        [<SMT Invariant("positive_definite(predicted.Covariance)")>]
        let predictState (current: KalmanState<Posit<32,2>>) =
            let F = current.TransitionMatrix
            let Q = current.ProcessNoise

            // State prediction: x̂ₖ₊₁|ₖ = F * x̂ₖ|ₖ
            let predictedState = F * current.State

            // Covariance prediction: Pₖ₊₁|ₖ = F * Pₖ|ₖ * F' + Q
            [<SMT Assert("symmetric(predictedCovariance)")>]
            let predictedCovariance = F * current.Covariance * F.Transpose + Q

            { State = predictedState
              Covariance = predictedCovariance
              TransitionMatrix = F
              ProcessNoise = Q
              Error = current.Error }

        // AI: Learn adaptive measurement noise
        [<SMT Ensures("noise.Mean > 0.0 && noise.Variance < max_variance")>]
        [<SMT Ensures("noise.IsStationary || noise.AdaptsSlowly")>]
        let learnNoisePattern (history: Measurement<Posit<16,1>> array) =
            let nn = NoiseEstimator<Posit<16,1>>()
            nn.Configure({
                WindowSize = 100
                Architecture = "LSTM"
                MaxVariance = 1.0<unit^2>
            })

            // Learn noise characteristics with bounds
            [<SMT Invariant("forall t. noise(t) > 0")>]
            let noiseModel = nn.EstimateNoise(history)

            // Verify learned model is physically reasonable
            [<SMT Assert("noise.AutoCorrelation < 0.95")>]  // Not perfectly correlated
            noiseModel

        // Update step with numerical stability
        [<SMT Ensures("result.Covariance = (I - K*H) * predicted.Covariance")>]
        [<SMT Invariant("no_overflow(computation)")>]
        let computeUpdate (predicted: KalmanState<Posit<32,2>>)
                         (meas: Measurement<Posit<16,1>>)
                         (noise: NoiseModel<Posit<16,1>>) =

            // Use quire for exact accumulation in critical computation
            use covarianceAccumulator = Quire<32, 2048>.Zero

            let H = meas.ObservationMatrix
            let R = noise.CovarianceMatrix

            // Innovation covariance: S = H * P * H' + R
            [<SMT Assert("positive_definite(S)")>]
            let S = H * predicted.Covariance * H.Transpose + R

            // Kalman gain: K = P * H' * S⁻¹
            [<SMT Assert("well_conditioned(S)")>]  // Invertibility check
            let K = predicted.Covariance * H.Transpose * S.Inverse

            // State update: x̂ₖ₊₁ = x̂ₖ₊₁|ₖ + K * (z - H * x̂ₖ₊₁|ₖ)
            let innovation = meas.Value - H * predicted.State
            let updatedState = predicted.State + K * innovation

            // Covariance update (Joseph form for numerical stability)
            [<SMT Assert("positive_definite(updatedCovariance)")>]
            let I_KH = Matrix.Identity - K * H
            let updatedCovariance =
                I_KH * predicted.Covariance * I_KH.Transpose + K * R * K.Transpose

            // Error bound propagation
            [<SMT Assert("updatedError <= predicted.Error + noise.Bound")>]
            let updatedError = sqrt(predicted.Error ** 2.0 + noise.Bound ** 2.0)

            { State = updatedState
              Covariance = updatedCovariance
              TransitionMatrix = predicted.TransitionMatrix
              ProcessNoise = predicted.ProcessNoise
              Error = updatedError }

        // Main implementation with all verifications
        categorical {
            // Prediction step
            let! predicted = predictState state

            // Learn measurement noise from recent history
            let! noiseModel = learnNoisePattern measurement.History

            // Update with learned noise model
            let! updated = computeUpdate predicted measurement noiseModel

            // Verify all properties maintained
            [<SMT Assert("positive_definite(updated.Covariance)")>]
            [<SMT Assert("updated.Error <= state.Error + measurement.Noise")>]
            [<SMT Assert("numerically_stable(updated)")>]

            return updated
        }

This implementation shows how the Kalman filter’s mathematical properties (positive definiteness, error bounds, numerical stability) are verified through SMT attributes that also* transfer to inline proofs in MLIR, integrating learned noise models and using Universal numbers for exact computation in critical sections.

The Unified Computational Future

The Convergence Timeline

  graph TB
    B --> C[2027-2028: Production Systems]
    C --> D[2029+: Unified Paradigm]
    A[2024-2025: Separate Worlds] --> B[2025-2026: Early Field Tests]

    A -->|"HPC: Case Studies"| A1[FPGA + GPU]
    A -->|"AI: LLVM<br>CUDA & SPIR-V"| A2[BFloat16,8,4]

    B -->|"Universal Numbers"| B1[Research on Result<br>Equivalence & Supremecy]
    B -->|"Fidelity Prototypes"| B2[Hybird Hardare<br>Experiments]

    C -->|"Unified Clusters"| C1[New Hardware Platforms]
    C -->|"Amortizable Value"| C2[Extant/Hybrid Systems]

    D -->|"No HPC vs AI vs Quantum<br>vs Conventional"| D1[Full Platform Convergence]

A Natural Path to General Quantum Compute

As we explored in our quantum optionality analysis, this categorical foundation does more than unify classical HPC and AI; it provides an algorithmic bridge to quantum computing. The same categorical morphisms that describe neural networks and physical simulations also describe quantum circuits.

The alignment is algorithmic rather than aspirational. Quantum mechanics was categorical before computer scientists discovered category theory. When Heisenberg developed matrix mechanics and Schrödinger wave mechanics in the 1920s, they were unknowingly working with functors between categories. When Dirac showed these were equivalent formulations, he was proving a categorical equivalence:

// All backends preserve the same categorical properties
[<SMT Assert("forall backend. factorize(n).Result.p * factorize(n).Result.q = n")>]
type UniversalComputation<'Input, 'Output> =
    | Classical of CPUComputation<'Input, 'Output>
    | Parallel of GPUComputation<'Input, 'Output>
    | Quantum of QuantumCircuit<'Input, 'Output>
    | Hybrid of (Classical * Quantum)<'Input, 'Output>

    // All share the same categorical structure
    [<SMT Ensures("result.Forward.Domain = this.InputType")>]
    [<SMT Ensures("result.Forward.Codomain = this.OutputType")>]
    [<SMT Ensures("adjoint(result.Forward) = result.Backward")>]
    member this.AsCategorical() =
        categorical {
            // Forward morphism
            let! forward = this.Forward

            // Adjoint (backward/inverse)
            let! adjoint = this.Adjoint

            // adjunction laws discharged by the verifier
            [<SMT Assert("compose(forward, adjoint) = identity(codomain(forward))")>]
            [<SMT Assert("compose(adjoint, forward) = identity(domain(forward))")>]
            return Morphism(forward, adjoint)
        }

// Concrete example: Prime factorization with automatic backend selection
[<SMT Requires("n > 1 && not isPrime(n)")>]
[<SMT Ensures("result.p * result.q = n")>]
[<SMT Ensures("isPrime(result.p) && isPrime(result.q)")>]
let factorize (n: bigint) : UniversalComputation<bigint, Factor * Factor> =
    match n with
    | SmallNumber when n < 10_000I ->
        // Classical trial division for small numbers
        Classical (CPUComputation.trialDivision n)
    | MediumNumber when n < 1_000_000_000I ->
        // Parallel Pollard's rho for medium numbers
        Parallel (GPUComputation.pollardRho n)
    | LargeNumber when quantumAvailable() && n.BitLength > 128 ->
        // Shor's algorithm for cryptographic-scale numbers
        Quantum (QuantumCircuit.shor n)
    | _ ->
        // Hybrid: classical preprocessing + quantum period finding
        Hybrid (Classical.reduce n, Quantum.periodFind)

We’re not adapting our framework to include quantum computing; quantum computing was already part of this mathematical structure. The 2-categorical framework that unifies HPC and AI is designed to encompass quantum without extension. As quantum hardware reaches practical maturity, our design intends for it to slot into the same categorical infrastructure as another instance of the patterns that already cover HPC and AI.

The categorical framework captures the structure of computation across substrates. As quantum hardware matures, we intend our systems to be ready through the same principles that unify HPC and AI.

Technical Hurdles and Open Questions

While this vision is compelling, we must be honest about the challenges ahead:

Mathematical Foundations: Translating category theory into efficient implementations remains an active research area. The gap between mathematical elegance and machine level computational efficiency is non-trivial.

Tool Maturity: F*, Z3, and MLIR are capable and in many cases well aligned, but integrating them cleanly for production use will require concerted effort. Our Fidelity framework implementation is still evolving to meet the challenges that each “corner case” will present.

Performance Validation: Theoretical advantages don’t always translate to equal speedups in material implementation. Bottlenecks emerge in unexpected places when targeting complex hardware. Extensive benchmarking across diverse workloads on extant and emerging hardware architectures will be necessary.

Ecosystem Integration: The machine learning community has massive investment in Python and adjacent ecosystems. Creating migration paths and interoperability layers is essential for adoption. The impedence mismatches between Python and principled use of the Fidelity framework will create challenges and opportunities for professional training, code conversion and migration tooling.

Hardware Co-design: To fully embody this vision will ultimately require new hardware architectures that natively support categorical operations and data flow based execution. While we see many vendors that show promise in this direction, it’s a growing field that will require considerable coordination, evaluations, prototyping and testing.

Despite these challenges, we believe the convergence is achievable. The inefficiencies of maintaining separate HPC and AI stacks, combined with the increasing demand for verified, efficient computation, will drive this unification.

The Unified Future is Now

The convergence of Categorical Deep Learning, Universal Numbers, and low-burden formal verification in the Fidelity framework points toward a unified way to build verified, efficient systems. Its underlying elements are available today:

  • CDL provides the theoretical underpinnings
  • Universal addresses the numerical challenges
  • F*/Z3 provides formal verification
  • Fidelity unifies the implementation through Clef

Our journey toward this unified vision wasn’t planned; it emerged from solving real engineering problems. That these solutions align with established mathematical principles gives us confidence in the direction.

What remains is the engineering effort to realize this integration, working from the position that classical compute, AI, HPC, and quantum are complementary aspects of a single computational science.

The Path Forward

We see the next generation of software platforms as a unified, verified, numerically correct landscape that combines algorithmic integration with hardware-aware engineering. For SpeakEZ Technologies, that direction begins with the design summarized in this document.

The open question is not whether quantum, classical, HPC, and AI will merge, but how quickly we can build the infrastructure to support a unified paradigm. Organizations that work in this convergence stand to compute faster, adapt more readily, and build on a verified foundation: digital twins held to physical conservation laws, autonomous systems with certified safety properties, and scientific tools that learn under formal constraints.

This convergence offers a path to efficiency improvements without waiting for Moore’s Law.

Moving from control-flow to data-flow can yield order-of-magnitude improvements, compile-time verification removes weeks of debugging, and Universal numbers cut the computational “dithering” that wastes development cycles. The direction we see is not more transistors and more engineers, but correctness in the algorithms delivering efficiency at scales that matter.

This is the work we are building toward in the Fidelity framework through Categorical Deep Learning, Universal Numbers, and formal verification: computation that is rigorous and adaptive, verified and efficient. The standing trade-off between systems that compute exactly but cannot learn and systems that learn but cannot prove is the one we are setting out to close.

I’ll keep developing this design at SpeakEZ as the framework matures, and I expect the work to keep finding confirmation in the mathematics, the same way the CDL paper did. That’s where my interest lies as the work continues.