Documentation
What this is
AI agents from scratch is a 14-lesson Node.js course on building AI agents from first principles. An AI agent is any system where a language model can take actions — call tools, read memory, loop until done — rather than just returning a single response.
The course uses local LLMs (via node-llama-cpp) and hosted APIs (OpenAI). Every pattern is implemented without frameworks so the mechanics are transparent. Once you understand how a ReAct loop or DAG executor works from the source up, using LangChain or LlamaIndex becomes a choice rather than a dependency.
How it works
Three stages, 14 lessons, ~6 hours total. Each lesson is a focused concept paired with a minimal working implementation.
Raw inference → hosted APIs → system prompts → reasoning → batch → streaming
Stage 02 — Agent patterns (lessons 7–10)
Function calling → memory → ReAct → AoT / DAG planning
Stage 03 — Advanced reasoning (lessons 11–14)
Error handling → tree of thought → graph of thought → chain of thought
Prerequisites
Node.js 18+ and npm. Working knowledge of async/await and Promises.
You should know what a REST API is and be comfortable reading JSON. No prior AI or ML experience is required — the course explains the model primitives from scratch.
For local LLM lessons: a GGUF-format model file (4–8 GB). A GPU is optional but speeds things up. For hosted API lessons: an OpenAI API key or any OpenAI-compatible endpoint.
Glossary
context windowThe maximum number of tokens (input + output) a model can process in a single inference call. Determines what fits in memory.
KV cacheA key-value store of past attention computations. Reusing it avoids re-processing tokens already seen, speeding up multi-turn inference.
function callingA protocol where the model outputs a structured JSON call to a named tool rather than natural language. The runtime executes the tool and feeds the result back.
ReActReason + Act. An agent pattern where the model alternates between generating a thought, picking an action, and observing the result — iterating until done.
DAGDirected Acyclic Graph. A way to represent task dependencies with no cycles. Steps whose deps are resolved can run in parallel.
GGUFA binary format for quantized LLM weights. Designed for CPU/GPU inference with node-llama-cpp and llama.cpp.
system promptInstructions prepended to every conversation that define the model's role, constraints, and output format. The agent's identity.
tokenThe smallest unit of text the model operates on — roughly a word-piece. A rough rule: 1 token ≈ 0.75 words in English.
Stack
Lesson map
Specialize an agent with system prompts and enforce structured output formats.
Typed errors, retry with backoff, graceful degradation, and correlation IDs.
Generate multiple reasoning paths, score them, prune to the best branches.
Extract from parallel sources, resolve conflicts, generate from a unified context.
Sequential reasoning phases to prevent bias and produce full audit trails.