Making Systems Better

AI Performance Engineering

We work on performance engineering, AI infrastructure, and business automation. From low-level optimization to production pipelines and strategic advising. We make systems faster, more reliable, and cost-effective.

We build and optimize AI systems, working on everything from GPU kernels to distributed training.

Latest Work

→ Porting CUDA FFT to Mojo: Achieving Bit-Exact Precision → Optimizing AlphaFold's Triangle Multiplicative Update: A First Look at GPU Performance Engineering → Multi-GPU Programming with AMD's Iris Framework for Triton → Gluon: When Triton Isn't Low-Level Enough → The Hidden Math Bug That Makes AI Unpredictable → Building Agents for Small Language Models: A Deep Dive into Lightweight AI → AMD GPU Support in Triton Gluon Framework → RustBPE: High-Performance BPE Tokenizer Training in Rust