What a failed RAG experiment taught me about AI memory

Jun 10, 2026Note

For about three weeks, I ran a personal AI operating system — 632 Markdown files, daily protocols, project notes, the works. The obvious next step was adding memory: let the AI search my vault semantically instead of relying on me to point it to the right file every time.

I built the pipeline properly. Chunking, BGE-M3 embeddings, 3,456 vector chunks, incremental updates, query routing. Every component worked at production level. I was ready to be impressed.

The answers were worse than just reading the damn files.

Here's what happened. When I asked "what are my strongest AI product cases," the vector search returned career reflections, self-assessments, and capability summaries — things that *sounded like* they were about my product work. It completely missed the case studies, PRDs, eval scorecards, and demo links. The hard evidence. The stuff that actually proved anything.

The Memory Pack looked complete. That was the dangerous part. A sparse or obviously incomplete result would have sent me back to search manually. But a full-looking answer with plausible-sounding content gave the agent permission to stop looking.

I had optimized for semantic similarity. But semantic similarity retrieves what *sounds* relevant, not what *proves* relevant. Career reflections and capability summaries are semantically close to "what are my strongest cases" — they use the same vocabulary, the same themes. But they're the wrong evidence. They're what I told myself about my work, not the work itself.

The fix wasn't better embeddings or a different chunking strategy. It was recognizing that retrieval is a chain, not a single step. Semantic search is now the third path in a four-step sequence: direct read of known files → directory search → keyword grep → semantic retrieval → back to source files for verification. No single retrieval method is trusted alone.

The AME retrieval layer still runs. But it's positioned as a discovery tool, not an answering tool. It says "you might want to look here." It doesn't say "here's the answer." That distinction cost me a working pipeline to learn, and it was worth it.