Welcome to the AI Terminal

This blog is a space for going deep on AI/ML: the engineering, the research, and everything in between. If you're here, you probably care about how things actually work under the hood, not just the hype cycle.

1. Who Is This For

This is going to get geeky. If you're looking for high-level LinkedIn thought leadership about how "AI will change everything," you're in the wrong terminal.

This blog is for technical people: engineers, researchers, and practitioners who want to understand the details. People who read papers, write training loops, debug CUDA kernels, and argue about attention mechanisms.

Expect content that goes both wide and deep:

Whether you're a seasoned ML engineer or someone ramping up and wanting to go beyond the tutorials, there should be something here for you.

2. Written by a Human, Enhanced by LLM

Let's talk about how this blog gets written.

Every post starts with me: my experience, my opinions, my mistakes. I use LLMs as a writing tool the same way I use them for code: to accelerate, to refine, to catch what I miss. But the ideas, the technical depth, and the perspective are mine.

This is not LLM slop. You won't find generic "10 Things You Need to Know About Transformers" content here. If I'm writing about something, it's because I've actually built it, broken it, spent too many hours debugging it, or am simply interested and curious about it. Authenticity over polish.

The bar is simple: every post should teach you something you couldn't get from a quick ChatGPT prompt.

3. Code-First

I believe the best way to understand ML is to look at real code, run real commands, and work through real math. Posts here will be code-first: not pseudocode hand-waving, but actual implementations you can read, run, and learn from.

Here's the kind of thing you'll see. A PyTorch implementation of scaled dot-product attention:

Real training commands, not toy examples:

And the math that makes it all work. The attention formula behind that code:

equation

The cross-entropy loss that drives language model training:

equation

Inline math works too: learning rate equation, batch size $B = 32$, gradient accumulation steps to simulate larger effective batches.

Code, commands, and math. That's the language we'll be speaking here.

4. What's Coming

Topics in the pipeline include LoRA fine-tuning from scratch, KV cache optimization, building evaluation harnesses for LLMs, and deep-dives into recent papers. Stay tuned.


Originally published on AI Terminal.

Tags: blog, jekyll