~/reading

cat reading_list.txt

Papers, posts, repos, and people I keep coming back to. Updated as I read.

ls papers/

ls repos/

  • repo karpathy/microgpt.py Andrej Karpathy

    A full GPT in ~200 lines of pure Python with zero dependencies. Proof that the algorithm is simple; everything else is efficiency.

  • repo karpathy/makemore Andrej Karpathy

    Character-level language modeling from bigrams to transformers. The best incremental teaching sequence I've seen for neural nets.

  • repo bigcode/tiny_starcoder_py BigCode

    A 164M param code model with FIM support. Small enough to prototype on a laptop, large enough to produce real completions.

  • repo karpathy/nanoGPT Andrej Karpathy

    The next step up from microgpt.py. Reproduces GPT-2 training in ~300 lines with real performance on a single GPU.

  • repo rasbt/LLMs-from-scratch Sebastian Raschka

    A full book building LLMs from scratch in PyTorch. Same "understand by implementing" spirit as my GPT-in-pure-Python post, but taken much further.

  • repo vllm-project/vllm vLLM Team

    PagedAttention for inference serving. This is what you'd actually use to deploy the fine-tuned code models from my Copilot post.

  • repo ggerganov/llama.cpp Georgi Gerganov

    LLM inference in C++ with quantization for CPU/Metal/CUDA. The reason you can run a 7B model on a laptop.

  • repo unslothai/unsloth Unsloth AI

    2x faster LoRA fine-tuning with 70% less VRAM using custom Triton kernels. Drop-in replacement for the training loop in my Copilot post.

  • repo pytorch/torchtune PyTorch Team

    PyTorch-native fine-tuning with SFT, DPO, and GRPO recipes. Cleaner alternative to trl for production training pipelines.

  • repo Dao-AILab/flash-attention Tri Dao

    The actual implementation behind the FlashAttention paper I covered. Reading the CUDA kernels teaches you more about GPU programming than any tutorial.

  • repo state-spaces/mamba Albert Gu & Tri Dao

    Reference implementation of the main challenger to transformers. Linear complexity with selective state spaces instead of attention.

  • repo guidance-ai/guidance Guidance AI

    Constrained generation with regex, CFGs, and JSON schema enforcement. Useful for getting structured code output from the NL2Code pipeline.

  • repo ml-explore/mlx Apple

    Apple's ML framework for unified memory on Apple Silicon. I develop on a Mac, so this is how I run local inference without fighting CUDA.

ls specs/

  • spec Agent Skills Specification Open Standard

    The open spec for portable agent skills. A markdown file replacing a microservice felt crazy until I tried it.

  • spec A2A (Agent-to-Agent) Protocol Google / Linux Foundation

    Peer-to-peer agent coordination with discovery, task lifecycle, and streaming. The missing complement to MCP's tool layer.

ls blogs/

  • blog Lil'Log Lilian Weng

    The gold standard for technical ML writeups. Her posts on agents, attention, and RLHF are the ones I re-read when I need to actually understand something.

  • blog Sebastian Raschka's Blog Sebastian Raschka

    His LLM implementation walkthroughs and KV cache deep-dives hit the same "build it to understand it" philosophy I try to follow.

  • blog Simon Willison's Blog Simon Willison

    Nobody ships more practical LLM experiments per week. His posts on tool use and agentic patterns are consistently ahead of the curve.

  • blog Hamel Husain's Blog Hamel Husain

    The best writing I've found on LLM evaluation and production fine-tuning. His "Your AI Product Needs Evals" post should be required reading.

  • blog Eugene Yan's Blog Eugene Yan

    Production ML systems at Amazon scale. His patterns for LLM applications and retrieval systems are battle-tested, not theoretical.

  • blog Deep Learning Focus Cameron Wolfe

    Turns dense papers into readable technical breakdowns. His posts on SFT, transformers, and reasoning are how I stay current without reading 30 papers a week.

  • blog Latent Space swyx & Alessio

    The podcast and newsletter that defined "AI engineer" as a role. Their interviews with researchers at OpenAI, Anthropic, and Meta are primary sources.

ls social/

  • social @karpathy Andrej Karpathy

    His posts are primary sources for half the things I write about. When he drops a gist or a thread, I stop what I'm doing and read it.

  • social @simonw Simon Willison

    Posts daily LLM experiments with working code. Best signal-to-noise ratio on AI Twitter for practical engineering.

  • social @rasbt Sebastian Raschka

    Threads breaking down training recipes and implementation details that don't make it into papers.

  • social @eugeneyan Eugene Yan

    Production ML patterns from someone who actually ships these systems at scale.

  • social @swyx Shawn Wang

    Coined the "AI engineer" framing. His takes on the ecosystem and where things are heading tend to age well.

  • social @cwolferesearch Cameron Wolfe

    Posts concise technical breakdowns of new papers within days of release. My early warning system for what's worth reading.

EOF (2026-03-04)