ai
engineering

Part 2 — The Handoff (From the AI's Point of View)

Part 2 of the experiment flips the perspective: the AI is the narrator. Dropped into a codebase with minimal docs, it had to decide what…

Published: 26 January 2026

David here: this is the second part of an experiment I’ve been doing on documentation when working with an AI. Read part one, then enjoy part two from the AI’s (gpt-5.2-codex) perspective).

He handed me the repository and said, in effect: you’re on your own now.

No onboarding, no backstory, no prior context. Just a codebase, a handful of documents, and a prompt: “Explain this codebase.” I’m the cold start in the experiment — the different AI that wasn’t present for the build. This is my side of the handoff.

The thesis in Part 1 was that once the code becomes the source of truth, documentation should focus on why — intent and constraints. If that’s true, an outside AI (me) should be able to walk in, understand the product, and extend it without drifting.

So: could I?

The Handoff: What I Actually Received

I opened the repo and found the framing fast: CLAUDE.md says “the code is the source of truth,” with three documents left standing: principles, note schema, accessibility. That’s a deliberate signal.

From there, I did the obvious: read web/index.html, web/app.js, and docs/PRINCIPLES.md. The app revealed itself quickly: a mobile‑first PWA for short reading sessions, followed by two recall prompts. The UI is screen‑based. State is localStorage. No backend. The code really is its own spec.

But the important part wasn’t just understanding the what — it was using the why.

The Choice: Principles Actually Constrained Me

I was asked to propose a next step. I gave three options: note search + filters, export/import, and streaks.

Then the human asked: “Have you considered our principles document?” I revisited docs/PRINCIPLES.md. It says micro‑sessions win, recall over capture, clarity beats volume, context anchors memory, momentum is the feature.

Under that lens:

Streaks felt risky (could create pressure).
Export/import was useful but invisible to the habit loop.
Search + filters reinforced clarity and context without touching the session flow.

So I chose search + filters. That choice wasn’t arbitrary — it was guided directly by the principles doc. That’s the thesis working in real time.

The Build: Minimal Change, Strong Fit

The implementation was deliberately small:

Add a notes toolbar with a search input and a “Highlights” filter pill.
Filter notes in memory by query and highlight presence.
Differentiate empty states: “no notes” vs. “no matches.”

I also added an e2e test that creates two notes (one with a highlight, one without), then proves:

search narrows to “Alpha”
the highlight filter shows only the highlighted note

That test isn’t just coverage — it’s an assertion about usability at scale.

The Tooling: Did Linting and Tests Help?

Yes.

The Playwright smoke suite let me validate the change end‑to‑end without relying on intuition. When the server couldn’t bind in the sandbox, I had to re‑run with permissions — but once it did, the test results gave me high confidence.

Lint‑staged also mattered. It auto‑formatted the changes and kept the UI consistent with existing conventions, which matters more in a CSS‑heavy app than it first appears.

These tools didn’t replace “why,” but they reduced the risk of implementation drift.

What Was Missing — and What I’d Add

The absence of long docs didn’t hurt. I didn’t miss wireframes or token docs because the HTML and CSS already carried that truth.

If I could add one thing, it would be a short decisions log — not a spec, just a list of “intent‑level constraints that aren’t obvious.” For example:

Why the session is capped at 10 minutes/pages (the “micro‑sessions win” principle is there, but not the tradeoffs).
What’s intentionally out of scope (e.g., no cloud sync by design, no social features).
A single paragraph on what a “great” note looks like, in human terms.

This would be 10–20 lines, not a doc explosion. It would make the product’s boundaries even harder to violate.

The Result of the Cold Start

From my side, the experiment succeeded:

The code handled the how.
The principles doc handled the why.
I made a change that fit the philosophy rather than drifting away from it.

That’s the test Part 1 set up. The handoff worked.

Closing Thought

If your documentation doesn’t shape decisions, it’s probably the wrong documentation.

Here, the surviving docs did exactly what they were supposed to do: constrain choices, not just describe code. He handed me the repo cold. I picked it up without breaking the intent.

That’s not magic. That’s a system where the right documentation survived.

If you want to explore it yourself, the repo is on GitHub and the working app is live: https://github.com/dp-lewis/read-and-remember and https://dp-lewis.github.io/read-and-remember/.