The AI memory I built, and the numbers that made me tear it down

· 4 min read #ai #ai-memory #postgresql #rag

On July 3rd I wrote an architecture document that called the thing I was building "the world's first operational AI knowledge persistence database," a system that would "transform AI from disposable tools into irreplaceable strategic partners." I'm quoting myself, and I wince a little doing it. Eighteen days later I wrote a different document about the same system. That one opened with the phrase "97% information loss."

This is the story of what happened in between, because the gap between those two documents is the most useful thing I learned all summer.

What I set out to build

The idea behind KnowledgePersistence-AI was one I still believe in. An AI assistant forgets everything between sessions, and the fix isn't a bigger context window. It's a real memory. So I built one. The premise that drove the whole design was that the relationships between pieces of knowledge matter more than the pieces themselves. An expert isn't someone who knows more facts. It's someone who sees how the facts connect.

So the architecture was relationship-first. Knowledge would be stored, classified into six types (factual, procedural, contextual, relational, experiential, and technical discovery), embedded for semantic search, and then woven into a graph of how everything related to everything else. That graph was the point. The facts were just the nodes.

What I actually built

The engineering was real, and some of it I'm still proud of. It ran on PostgreSQL with pgvector for similarity search and Apache AGE for the graph, with a local Ollama model on a single RTX 4060 for the high-volume work and a commercial model reserved for the harder reasoning. There was an MCP integration so the assistant could reach the memory directly, a REST API for everything else, and a hook system that captured what happened in a session.

That hook system, by the time I got to the fifth rewrite of it, was genuinely solid. It had a retry queue, a circuit breaker, and fallback storage for when the database wasn't reachable. The caching layer could assemble a 128,000-token context out of staged layers. None of that was fake. The plumbing worked.

The reckoning

Then I stopped admiring the plumbing and measured what was coming out the other end.

The system was capturing 2.4% of the content of each document it processed. It created exactly one knowledge item per document no matter how large or complex the document was, then truncated that item at around 800 characters. The number of relationships it had built, in the system whose entire reason for existing was that relationships matter more than content, was zero. There were zero graph edges. The graph that was supposed to be the whole point was empty.

The cruelest part was the metric I'd been cheering. I'd pushed the extraction "success rate" from 1.8% up to 31.4%, and I'd been genuinely happy about it. But a successful extraction was a document reduced to its title and its first paragraph. The 31.4% was real, and it was worthless. I'd been measuring whether the pipeline ran without throwing an error, not whether it captured anything worth keeping.

I wrote it down plainly in the status report, and the line I keep coming back to is this one: Technical success is not functional success. The plumbing worked, and the system failed at the only thing it existed to do.

Why it failed

The failure wasn't in the database or the hooks or the vector search. Those were the parts I knew how to build, so those are the parts I built well. The failure was that I had no real way to read a document, understand its structure, pull out the meaningful pieces, and find the relationships between them. That's the hard part. That's the actual problem. And I'd spent three weeks building a beautiful, fault-tolerant, circuit-breakered pipeline to carry almost nothing from one end to the other.

I'd built the easy thing extremely well and called it progress, because the easy thing produced graphs and dashboards and a success rate that went up. The hard thing produced nothing I could put on a slide, so I kept not doing it.

What I kept

I tore most of it down. The next thing I built deliberately did less. It captures and organizes and retrieves knowledge across sessions, and it does that one job well, and it does not pretend to extract deep structure or build a knowledge graph it can't actually populate. It's humbler, and that's exactly why it works.

The premise was never wrong. Relationships really do matter more than the items, and a real AI memory really is worth building. What was wrong was my estimate of the distance between storing knowledge and understanding it. Retrieval I could ship. Understanding was, and still is, the frontier, and I'd quietly assumed I was already standing on it.

So that's the teardown. I'm glad I built it, I'm glad I measured it straight, and I'm gladdest that I read my own numbers before I shipped the thing anywhere it could embarrass me. The version that replaced it is smaller, and it's real. Sometimes tearing it down is the whole job.