Blog

Engineering notes on custom kernels, local inference, and hardware design.

RTX 3090 + eGPU dock + MacBook — running NVIDIA on macOS over USB4

The eGPU Myth: Why a ~$300 Dock Won't Turn Your GPU Into an AI Workstation

tinygrad wrote an NVIDIA driver from scratch. We ran real models on an RTX 3090 over USB4. The engineering is brilliant. The numbers aren't there yet. Full benchmarks and profiling.

RTX 3090 — the GPU behind the megakernel

Megakernel: Matching Apple Silicon Efficiency at 2x the Throughput on a RTX 3090

The first megakernel for hybrid DeltaNet/Attention LLMs. All 24 layers fused into a single CUDA dispatch. 1.87 tok/J, matching M5 Max efficiency at 1.8x the throughput on a 2020 GPU.