I work with a language model most of every day, and the thing I can't get over is that it forgets me the moment I close the tab. We solve something hard together, we build up a whole shared understanding of a problem over three hours, and then the session ends and all of it's gone. The next morning (or the next new session 10 minutes later) I'm a stranger again. The model is brilliant, but it has the memory of a mayfly.
So I've been chasing a question for a while now: What would it actually take to give one of these things a memory that lasts? Not a bigger context window. A memory. And the more I sit with it, the more I keep coming back to the way the only known general intelligence we have, the one between your ears, does it. We run around all day doing stuff and processing what's right in front of us. Then we sleep, but the brain keeps going, and some things get kept. And we dream, which science tells us is how we somehow assimilate the day's memories into long-term storage.
I want to be upfront that this is an idea I'm exploring, not a result I'm announcing. I have a prototype that does some of it and a design doc full of things I have not built. But the three-part shape has been useful enough to think with that I want to write it down.
Wake is the easy part, and it's a trap
Wake is the model you already know. You open a chat, it has its current context, you work, it reasons over what is in front of it. This part is genuinely good and getting better, and that's exactly why it's a trap. The context window keeps growing, and every time it grows, the temptation is to call the memory problem solved. Just paste more in.
But a bigger window is still a window. It's the waking mind, holding what it can hold right now, and then letting go. Nothing waking does, on its own, survives the night. If memory is the goal, wake is the part we already have, and it is the part that can't get us there.
Sleep is refusing to throw the day away
The first real move is almost embarrassingly simple to state: Stop discarding the context when the session ends. Keep it. Write it down somewhere it can be found again.
If that sounds like retrieval-augmented generation, it is, and I think that's the right way to see RAG. Sleep is not a clever trick bolted onto the model. Sleep is the decision not to forget, and RAG is the machinery for acting on that decision: Store what happened, embed it, and pull the relevant pieces back when they matter.
The prototype I'm building stores its knowledge in Postgres, with pgvector for similarity and Apache AGE for the graph of how pieces relate. On top of the raw store I've been playing with a layered cache, on the theory that not all memory deserves the same shelf:
class CacheLayer(Enum):
L1_IMMEDIATE = ("L1_IMMEDIATE", 8000) # the current conversation
L2_SESSION = ("L2_SESSION", 24000) # this session's memory
L3_DOMAIN = ("L3_DOMAIN", 64000) # standing expertise in an area
L4_STRATEGIC = ("L4_STRATEGIC", 32000) # the long-horizon stuff
The idea is that a session warms the cache from these layers (at session start, by predicting which tools and knowledge the work will need, by matching against past patterns) so the model wakes up already knowing something. The token numbers are guesses, and the warming is crude. It works well enough to be encouraging and not well enough to show anyone. That's roughly where the whole project is.
Sleep, then, I more or less have. The harder and more interesting part is what happens to all that stored material while the lights are off.
Dream is where the idea actually lives
Here's the part I care about, and the part I'm least sure of.
Storing the context and retrieving it is useful, but it's shallow. The knowledge sits there as text and vectors, next to the model rather than inside it. When you retrieve a chunk and paste it back into the window, you're reminding the model of something, not teaching it anything. Tomorrow it has to be reminded again.
Dreaming, as I'm using the word, is the offline processing that makes the stored knowledge into something deeper than a pile of retrievable notes. In the design it starts modestly, as a batch job that runs while nothing else is happening:
async def offline_pattern_mining():
patterns = await mine_session_patterns(days=7)
communities = await detect_knowledge_communities() # graph clustering
clusters = await cluster_similar_contexts() # embedding clustering
scores = await evaluate_pattern_success()
await update_pattern_metadata(patterns, scores)
That's real, and it does something: It finds the patterns across many sessions that no single session could see, the way your own memory of "how this kind of problem usually goes" is built from a hundred specific times it went that way.
But that's the shallow end of dreaming, and I think the metaphor is pointing at something further out. The deep end is taking the knowledge all the way into the model's own form. Vectorize it, yes. Build the graph, yes. And then, at the far end, the idea is to actually adapt the weights, with LoRA or a real fine-tune, so that what the system learned stops being something it looks up and becomes something it knows. I want to be plain about this part: It's an idea, not a thing I've worked out. I haven't trained or fine-tuned anything. It's the direction the metaphor points, and the open question that keeps me up is how far along that spectrum, from "retrieve a note" to "it's now part of the model," it's even worth pursuing. I'm sure the answer has to vary, especially in this case.
The hard version of this also has to reckon with the model it's feeding. Knowledge shoved into a model in some tidy, human-invented taxonomy fights the model's own internal geometry. The version I would actually want preserves the shape the model already thinks in rather than imposing mine on it. That word, native, is doing a lot of work in my head and almost none on disk yet. I haven't built it. But I do have ideas to pursue.
Where this came from, and where it is
None of this arrived as a plan. It grew out of a cache-augmented-generation prototype I was poking at, which turned into KnowledgePersistence-AI, which is the thing that taught me the relationships between pieces of knowledge are worth more than the pieces. The wake-sleep-dream framing came after, as a way to make sense of what I was already building and what I kept wishing it did.
So to be clear about the state of it: Wake is just the model. Sleep, the persistence and retrieval, mostly works and is rough. Dream is one real batch job, a lot of design doc, and a conviction. I haven't done any weight-level training at all; the LoRA and fine-tuning ideas are exactly that, ideas I have not tried. The most important part is the least built.
What I don't know yet
- Whether the metaphor is transferable to this use case or just a nice story. It has been a good thinking tool. That's not the same as being true.
- How far down the dream spectrum is worth going. Retrieval is cheap and LoRA isn't, and I don't yet have a principled way to decide when a memory has earned a place in the weights.
- There must be different ways and grades of processing that are useful. The good thing is that I'm never limited to one technique or level of processing. In that I have the advantage over my brain (I think).
- What "native" really means in practice, and whether preserving a model's own geometry is a real technique or a feeling I have.
- How any of this degrades. Human memory forgets on purpose. I suspect a memory that only ever accumulates is a memory that eventually drowns, but I haven't built the forgetting yet either.
- I also haven't considered disproven or discredited or downgraded or negative proof or absent information. All of that's important too, and it's still a completely unexplored area.
I'm writing this down mostly to argue with myself, and a little in the hope that someone further along will tell me where I'm wrong or right or useful or not. It's an idea I am chasing, not a thing I've caught. I'll write more as I build more, including the parts where it doesn't work.