technical reference

Documentation

What this is

AI agents from scratch is a 14-lesson Node.js course on building AI agents from first principles. An AI agent is any system where a language model can take actions — call tools, read memory, loop until done — rather than just returning a single response.

The course uses local LLMs (via node-llama-cpp) and hosted APIs (OpenAI). Every pattern is implemented without frameworks so the mechanics are transparent. Once you understand how a ReAct loop or DAG executor works from the source up, using LangChain or LlamaIndex becomes a choice rather than a dependency.

How it works

Three stages, 14 lessons, ~6 hours total. Each lesson is a focused concept paired with a minimal working implementation.

Stage 01 — Fundamentals (lessons 1–6)
  Raw inference → hosted APIs → system prompts → reasoning → batch → streaming

Stage 02 — Agent patterns (lessons 7–10)
  Function calling → memory → ReAct → AoT / DAG planning

Stage 03 — Advanced reasoning (lessons 11–14)
  Error handling → tree of thought → graph of thought → chain of thought

Prerequisites

Node.js 18+ and npm. Working knowledge of async/await and Promises.

You should know what a REST API is and be comfortable reading JSON. No prior AI or ML experience is required — the course explains the model primitives from scratch.

For local LLM lessons: a GGUF-format model file (4–8 GB). A GPU is optional but speeds things up. For hosted API lessons: an OpenAI API key or any OpenAI-compatible endpoint.

Glossary

context window

The maximum number of tokens (input + output) a model can process in a single inference call. Determines what fits in memory.

KV cache

A key-value store of past attention computations. Reusing it avoids re-processing tokens already seen, speeding up multi-turn inference.

function calling

A protocol where the model outputs a structured JSON call to a named tool rather than natural language. The runtime executes the tool and feeds the result back.

ReAct

Reason + Act. An agent pattern where the model alternates between generating a thought, picking an action, and observing the result — iterating until done.

DAG

Directed Acyclic Graph. A way to represent task dependencies with no cycles. Steps whose deps are resolved can run in parallel.

GGUF

A binary format for quantized LLM weights. Designed for CPU/GPU inference with node-llama-cpp and llama.cpp.

system prompt

Instructions prepended to every conversation that define the model's role, constraints, and output format. The agent's identity.

token

The smallest unit of text the model operates on — roughly a word-piece. A rough rule: 1 token ≈ 0.75 words in English.

Stack

node-llama-cpp

Local LLM inference

openai

Hosted API SDK

Next.js 14

Web layer (this app)

Anthropic SDK

AI tutor backend

Lesson map

01Fundamentals

Introductionbasic llm

Load a local LLM and run your first prompt/response cycle.

OpenAI introhosted llms

Call GPT-4, control temperature, and track token usage.

Translationsystem prompts

Specialize an agent with system prompts and enforce structured output formats.

Thinkreasoning

Configure step-by-step reasoning and discover where pure LLM logic breaks down.

Batchparallelism

Run multiple LLM calls concurrently using context sequences.

Streamingresponse control

Stream tokens in real time and enforce hard token budgets.

02Agent patterns

Simple agentfunction calling

Give the LLM tools it can call. Text generation becomes agency.

Memory agentpersistent state

Store facts across sessions so the agent remembers context over time.

ReAct agentreason + act

Reason, act with a tool, observe the result. Repeat until solved.

AoT agentatomic planning

Plan the full task as a dependency graph, then execute deterministically.

03Advanced reasoning

Error handlingresilience

Typed errors, retry with backoff, graceful degradation, and correlation IDs.

Tree of thoughtbeam search

Generate multiple reasoning paths, score them, prune to the best branches.

Graph of thoughtdag merge

Extract from parallel sources, resolve conflicts, generate from a unified context.

Chain of thoughtauditable reasoning

Sequential reasoning phases to prevent bias and produce full audit trails.

FAQ

Do I need a GPU?

No. All local examples use CPU inference via node-llama-cpp. A GPU speeds things up but isn't required for learning.

Which model should I use?

Llama-3.1-8B-Instruct.Q4_K_M.gguf is a good starting point — 5 GB, runs on most machines, strong instruction following.

Local vs hosted — which for production?

Hosted (OpenAI, Anthropic) for capability and scale. Local for privacy, cost control, or offline environments. Lessons 1–6 teach both.

Does this course use LangChain?

No. The entire course builds agents from scratch using raw model APIs. Frameworks like LangChain are abstractions over exactly what you build here.