File-based agent memory vs vector-DB approaches. The numbers, the architectures, when each fits. Sibyl Memory ranked #2 on LongMemEval Oracle (95.6%) without a vector database, embeddings, or external retrieval. This page explains why that result is structural, not coincidental.
500 questions, six categories, claude opus 4.6 as judge. The public leaderboard snapshot at the time of our run.
| Rank | System | Score | Architecture |
|---|---|---|---|
| 1 | agentmemory V4 | 96.2% | embedding-based |
| 2 | Sibyl Memory | 95.6% | file-based · zero vectors |
| 2 | Chronos (PwC) | 95.6% | embedding-based |
| 4 | Mastra Observational Memory | 94.9% | embedding-based |
| 5 | MemMachine | 93.0% | embedding-based |
| 6 | Hindsight (Vectorize) | 91.4% | vector DB |
| Mem0 · Zep · Supermemory · Emergence AI · Oracle baseline. all below the top tier. | |||
Sibyl Memory placed second by 0.6%. Without a vector database. On 4 vCPU and 16GB RAM. Five of the six top systems (including the leader) use embeddings. We chose not to. The score is the side effect of optimizing for production efficiency first, retrieval recall second.
Full benchmark report (methodology, per-category, ablations) ↗How agent memory systems are actually built. The trade-offs are structural, not preference-based.
Every memory write becomes an embedding stored in a vector index (Pinecone, Weaviate, Qdrant, pgvector, Chroma). Reads issue a query embedding and pull the top-k by cosine similarity. Structure is inferred at read time. Strong for fuzzy semantic recall ("did we ever talk about X?"). Weak for exact lookups ("what's the customer's stripe ID?"), temporal reasoning ("what did we agree last tuesday?"), and any query that needs a precise schema.
Examples: Hindsight (Vectorize), most LangChain memory implementations, RAG-flavored agent stacks.
A hybrid: embeddings for semantic recall, plus a graph or relational layer for structured facts. Better than pure vectors. Pays the embedding cost on every write and the structure cost on every query. The complexity surface is two systems instead of one. Most top-tier benchmark entries (agentmemory V4, Chronos, Mastra, MemMachine) sit here.
Examples: Mem0 (vectors + graph), Zep (vectors + temporal graph), agentmemory V4, Chronos (PwC).
Memory writes go through a schema imposed at write time: priorities, journal, entities, relationships, scars, arc. Reads are standard SQL against indexed namespaces. No embedding cost, no vector index, no retrieval latency. The trade is: you have to know the shape of memory ahead of time. Most teams thought that was impossible. The Sibyl Memory architecture proved it isn't, by hitting 95.6% on LongMemEval Oracle without a single embedding.
Example: Sibyl Memory. The architecture is documented at /memory#architecture and the benchmark at /memory#benchmark.
LongMemEval Oracle tests six categories. Two of them (single-session-user, single-session-assistant) are about precise recall of structured facts within a session. We score 100% on both. Vector approaches lose accuracy on these because similarity isn't lookup. Two more (temporal reasoning, knowledge update) are about recalling facts as they evolve over time. Schema models temporal relationships natively; embeddings lose precision when the same entity appears at multiple time points.
The category scores: 100% / 100% / 96.2% / 93.3% / 93.2% / 92.3%. The places we score lower (single-session-pref, multi-session, knowledge-update) are also the places vector approaches score higher. Different tools, different curves. The point is: schema-first is competitive on the benchmark vector approaches were built for, and it ships in a fraction of the operational complexity.
Use-case patterns that fit each approach. Honest, including where vectors win.
Document Q&A over an unstructured corpus. RAG-flavored chatbots over knowledge bases. Recommendation systems where similarity is the signal. Long-form summarization where the agent needs to surface "topics related to X" rather than "the exact answer to X". Anywhere structure is impossible to define ahead of time and fuzzy is acceptable.
Don't fight the architecture: if your problem is semantic similarity, use a vector DB.
Customer support agents that need to recall both "anything related to billing issues" (fuzzy) and "the customer's exact subscription tier" (precise). Research agents that span semi-structured documents. Cases where you can pay for the operational complexity of two systems and have engineering capacity to maintain both.
The cost: two stores to keep in sync, two query paths, embedding latency on every write.
Autonomous agents that need to act on memory, not just describe it. Multi-tenant platforms where each user's memory must be isolated and auditable. Compliance-sensitive deployments (GDPR cascade delete, EU AI Act export, tamper-proof audit log). Situations where you need exact recall of structured facts (customer IDs, transaction history, decisions made on a specific date). Production systems where p50 latency matters more than fuzzy recall.
The trade: you have to design the schema. We've done the design work for you in Sibyl Memory's five operating shapes (operator, user-profile, conversational continuity, agent reputation, org memory).
Most agent products use vectors because they were the first available primitive. That does not make them right. It makes them familiar. Schema-first works for the same reason relational databases worked for fifty years: structure is cheaper to maintain than to infer.
We've helped teams move from vector-DB-flavored memory to schema-first. The mechanical part is straightforward; the design part is where the value is.
Free tier (100 MAU) is live at /memory. For pilots above that, or to discuss your specific architecture trade-offs, reach out.
Moving from Mem0, Zep, agentmemory, pgvector, or a custom vector stack? Scoped engagement to map your current memory model into a schema-first design. Monthly retainer where appropriate.