Every AI session starts from nothing and ends by throwing everything away. I got tired of reintroducing myself to my own assistant every morning, so I built it a memory. The interesting thing about the result isn't that it's smart. It's that it's deliberately not.
The system has exactly one job, and it does that job and no other: It captures what happens in a session, stores it where it won't be lost, and puts the right piece back in front of the model at the moment that piece helps. Store, insert, retrieve. It does not summarize your knowledge into something that replaces the original, it does not generate insights, and it does not decide what anything means. Those are different jobs, harder jobs, and the temptation to do them is exactly what sinks most memory projects. This one stays in its lane, and that's why it works.
Capture: Nothing gets thrown away
The capture happens through hooks on the session lifecycle, so I never have to remember to save anything. As a session runs, a hook records what's happening as observations, and it does this asynchronously, so it never slows the work down. When the session ends, or when the context window gets compacted, the record is already safe. The default behavior of an AI chat is to forget. The default behavior here is to keep.
That sounds trivial. It's the part everyone skips, because keeping a faithful record is boring and building a clever analyzer is fun, and the boring part is the one that actually has to be reliable.
Store: A shared PostgreSQL database and a local embedding model
The observations land in a PostgreSQL database on a dedicated machine, and every one of them is embedded by a local embedding model running on another box, which turns the text into a vector so the system can later find things by meaning, not just by keyword. Nothing leaves my network to do this. The embedding model and the database are both mine.
It's also shared across every machine I work on. Sessions from three different VMs all write into the same store, each observation tagged with the machine it came from, so a thing I figured out on one box is findable from another. The memory follows me, not the laptop.
Retrieve: Get the right thing back, and stay quiet otherwise
Getting knowledge back in is where a memory system either earns its keep or becomes noise, and noise is the real failure mode. A memory that injects everything it has, all the time, is worse than no memory at all, because it buries the current task under a pile of vaguely related history.
So retrieval happens at two moments, and it's picky at both. When a session starts, the system injects a little recent context and the most relevant past work, enough to pick up where I left off. Then, on each thing I ask, a selective-injection step decides whether anything in the store is worth surfacing for this specific prompt. It runs the candidate memories through layered filtering: It classifies what kind of prompt this is, drops the categories that won't help, holds the rest to a quality bar, and caps how much it's willing to add. Most of the time, it correctly decides the answer is nothing, and it says nothing. On top of that, the model can reach into the store directly when it wants to, with tools that recall by meaning, walk the timeline around a moment, or pull up everything connected to a given thing.
The cleverness, such as it is, lives entirely in when and whether to retrieve. It never lives in pretending to understand the content. The system is a very good librarian and not a scholar, and a librarian is what the job actually calls for.
Why "refuses to be clever" is the design, not a limitation
I've built the smarter version, more than once. A system that reads your sessions and extracts the real relationships and distills durable understanding is a genuinely exciting idea, and it's also a research problem nobody has cleanly solved, including me. When I aimed at it directly, I built something that produced beautiful structure and captured almost none of the meaning. Aiming lower, at faithful capture and good retrieval, produced something I actually use every day.
So the restraint is on purpose. The system keeps the real record instead of a lossy summary of it, because the real record is the thing you'll wish you had later. It surfaces and stays out of the way instead of narrating, because the model is already good at understanding once it has the right material in front of it. The whole design is an argument that for memory, retrieval is the product and analysis is a trap.
What it's like to use
In practice it means my assistant wakes up already knowing what we did yesterday, on whichever machine we did it. It means a decision I made and explained once doesn't have to be explained again. It means the context that used to scroll off the top of the window and vanish is instead sitting in a database, one similarity search away, ready to come back exactly when it's relevant and to stay silent when it isn't.
It is a boring system. It stores things, and it gives them back. I think that's the highest compliment a memory can earn, because the exciting ones don't survive contact with daily use, and this one has. If you're building memory for an AI and you find yourself designing the analysis layer first, this is me suggesting you build the librarian instead, and see how far it carries you. For me, it carried the whole way.