After class · at home Workshop
Build a research assistant agent
Take everything from class and build something you'll actually use: an agent that researches a topic for your work. You'll grow it in six levels — skeleton, a simple tool, a real tool, parallel researchers you merge into one summary, then the harness pieces that make it real: observability and evals.
Give the agent a research question (“What are the trade-offs of vector databases for RAG?”, “Summarize recent work on X for my thesis”). It splits the question into subtopics, researches each — in parallel — and returns one clean, structured brief. Build it with Copilot at your side, using the same ask → test → improve loop from class.
Have your free GEMINI_API_KEY set (see the Agents page), and pip install pydantic-ai. Keep a request cap on every run so you never blow through the free tier.
The six levels
-
Level 1 · Skeleton
A minimal agent that returns structured output
Start with the smallest thing that runs. Define the shape of a research brief and get the agent to fill it in — no tools yet.
research_agent.pyfrom pydantic import BaseModel, Field from pydantic_ai import Agent from pydantic_ai.usage import UsageLimits class Brief(BaseModel): topic: str key_points: list[str] = Field(description="3-5 concise findings") summary: str agent = Agent( "google-gla:gemini-2.0-flash", output_type=Brief, system_prompt="You are a rigorous research assistant. Be concise and factual.", ) if __name__ == "__main__": out = agent.run_sync( "Give me a research brief on vector databases for RAG.", usage_limits=UsageLimits(request_limit=5), ) print(out.output)Ask Copilot: “Explain what
output_typedoes here and what happens if the model returns invalid data.” -
Level 2 · A simple tool
Give it its first action
Add one easy tool so the model stops relying only on memory. Start with something trivial to prove the loop works — then you'll trust it with a real one.
research_agent.py — add a toolfrom datetime import date @agent.tool_plain def today() -> str: """Return today's date as YYYY-MM-DD, for grounding time-sensitive claims.""" return date.today().isoformat()Ask Copilot: “Write a quick test that runs the agent and asserts the
Brief.topicis non-empty.” Then run it and watch the model decide whether to calltoday(). -
Level 3 · A real tool
Let it reach the outside world
Now a tool that actually fetches information — a web search. Use any search API you like (Tavily, Brave, DuckDuckGo, SerpAPI). The agent calls it, reads the results, and grounds its brief in them.
research_agent.py — real toolimport os, httpx @agent.tool_plain async def web_search(query: str) -> list[str]: """Search the web and return the top result snippets for the query.""" resp = httpx.post( "https://api.tavily.com/search", json={"api_key": os.environ["TAVILY_API_KEY"], "query": query, "max_results": 5}, timeout=30, ) resp.raise_for_status() return [r["content"] for r in resp.json().get("results", [])]Vibe-code it: ask Copilot to handle the case where the API returns no results by raising
ModelRetry("No results — try a broader query.")so the agent reformulates instead of crashing. This is the “ask → test → improve” loop from class, on your own tool. (Tavily has a free tier; any search API works.) -
Level 4 · Parallel + aggregate
Many researchers at once, one merged answer
The real power move: split the question into subtopics, run a research agent on each concurrently with
asyncio.gather, then feed all the briefs to a final agent that synthesizes one report. Parallel means it finishes in the time of the slowest subtopic, not the sum of all of them.parallel_research.pyimport asyncio from pydantic import BaseModel from pydantic_ai import Agent from pydantic_ai.usage import UsageLimits # ... (Brief model + `agent` with web_search tool from Levels 1-3) ... class Report(BaseModel): question: str briefs: list[str] final_summary: str # A separate agent whose only job is to merge findings. synthesizer = Agent( "google-gla:gemini-2.0-flash", output_type=Report, system_prompt="Merge the research briefs into one coherent, non-repetitive report.", ) async def research_one(subtopic: str) -> Brief: out = await agent.run( f"Research this subtopic and return a brief: {subtopic}", usage_limits=UsageLimits(request_limit=6), ) return out.output async def main(question: str, subtopics: list[str]) -> Report: # 1. Fan out: all subtopics researched at the same time. briefs = await asyncio.gather(*(research_one(s) for s in subtopics)) # 2. Fan in: hand every brief to the synthesizer to combine. joined = "\n\n".join(f"## {b.topic}\n{b.summary}" for b in briefs) out = await synthesizer.run( f"Question: {question}\n\nBriefs:\n{joined}", usage_limits=UsageLimits(request_limit=5), ) return out.output if __name__ == "__main__": report = asyncio.run(main( "Should my team adopt a vector database for RAG?", ["performance & scaling", "cost", "alternatives to a dedicated vector DB"], )) print(report.final_summary)✅ You just built a mini research pipelineFan out (many agents in parallel) → fan in (one agent merges). That pattern scales from 3 subtopics to 30. Cap every run, and log how long the parallel version takes vs. running them one by one.
-
Level 5 · See inside it
Observability — trace every step
Right now your agent is a black box. Add tracing so you can see every prompt, tool call, retry, token count, and error. This is the harness piece that turns “it's broken somewhere” into “here's the exact call that failed.” Two lines with Logfire (free tier, made by the PydanticAI team):
research_agent.py — top of fileimport logfire logfire.configure() # sign in once with `logfire auth` logfire.instrument_pydantic_ai() # now every agent run is traced # ...define your agents and tools as before...Run the agent, then open your Logfire dashboard and watch the whole tree: the orchestrator, each parallel researcher, every
web_searchcall. Ask Copilot: “Where is most of the time spent?” — and read it off the trace. -
Level 6 · Prove it works
A tiny eval — catch regressions before they ship
“Seemed fine” isn't good enough. Write a handful of cases and check the agent still passes them every time you change a prompt or a tool. Start dead simple:
eval_agent.pyimport asyncio # (input question, a keyword the good answer should contain) CASES = [ ("Research briefly: what is RAG?", "retrieval"), ("Research briefly: what is a vector database?", "embedding"), ] async def run_evals(): passed = 0 for question, must_contain in CASES: out = await agent.run(question) text = out.output.summary.lower() ok = must_contain in text print(("PASS" if ok else "FAIL"), "-", question) passed += ok print(f"{passed}/{len(CASES)} cases passed") asyncio.run(run_evals())Keyword checks are a starting point. When you outgrow them, look at
pydantic-evals— or add an LLM-as-judge that scores each answer. Either way: an eval you can re-run is what makes improvement measurable instead of vibes.
Stretch goals
Cite sources
Have web_search return URLs too, and make the Brief include a sources list.
Auto-plan subtopics
Add a planner agent that turns the question into the subtopic list — so you only pass the question.
Self-check
Add a tool or step that flags weak/contradictory findings and re-researches them.
Save the report
Write the final report to a Markdown file you can drop into your notes.
Deliverable checklist
Aim to tick all six levels. Your progress is saved in this browser.
0 / 8 done
- Level 1: skeleton agent returns a valid
Briefstructured output works - Level 2: a simple tool the model actually callse.g. today()
- Level 3: a real tool that fetches outside informationweb search or an API
- Level 4: parallel research with
asyncio.gather+ a synthesizerfan out, fan in - Level 5: Logfire tracing wired in — I can see the call treeobservability
- Level 6: a re-runnable eval with at least 2 casesmeasurable quality
- Every run has a
UsageLimitsrequest capfree-tier safe - I compared parallel vs. sequential timingand noted the difference
You vibe-coded a real, useful agent — from a one-shot skeleton to a parallel research pipeline — the same way professionals do: small steps, tools, tests, and a tight loop with the AI. That's the whole course in one project.