The Origin Story: How We Landed on Video
Memvid was born from a real internal problem. While our team was building agentic systems for the healthcare industry, we ran into a foundational challenge: memory. We were responsible for building AI agents that could screen applicants and adapt to the unique, high-stakes requirements of healthcare staffing, reasoning over long histories of candidates, roles, facility requirements, and constantly changing constraints. The dataset wasn’t just large, it was mission-critical and evolving fast.The Problem We Faced
For an AI agent to be useful in real-world staffing workflows, it needed to reliably remember people, conversations, decisions, and constraints over long periods of time. When someone asked:- “What roles has this candidate applied for in the past six months?”
- “Have they worked night shifts before?”
- “What requirements did this facility specify last week?”
| Approach | Problem |
|---|---|
| Feed everything to the LLM | Large language models have strict context limits. Real-world histories quickly exceeded those limits, making full recall impossible. |
| Fine-tune/pretrain a model | Slow, expensive, and brittle. Data changed constantly, and retraining for every update simply didn’t scale. |
| Traditional RAG with VectorDB | Vector search retrieves similar information, not exact information. Critical details were lost to semantic approximation. |
| Chunking strategies | Chunking fractured context. Ordering, dependencies, and timelines were easily broken, often in subtle, dangerous ways. |
What We Actually Needed
We needed a system that could:- Store unbounded, growing histories
- Recall exact information, not semantic guesses
- Support real-time writes as new data arrived
- Run fully offline and on-prem
- Be portable and self-contained
- Minimize attack surface and infrastructure complexity
The Video Insight
What existing technology already handles massive, sequential data with random access, efficient compression, and decades of battle-tested reliability?The answer was video. A two-hour film contains millions of frames, yet you can jump to any moment instantly. The file is self-contained: no database, no server, no external dependencies. Corrupted frames don’t invalidate the entire file. And decades of optimization have made video codecs extraordinarily efficient. Real-world AI memory has the same shape. Information accumulates over time. Events are sequential. You need to jump to specific moments while preserving the full historical timeline. Memory must be incrementally writable, crash-safe, and portable across machines. Video codecs have spent 40+ years solving exactly these problems:
- Sequential data with random access: jump to any frame instantly
- Efficient compression: 100x compression ratios via redundancy exploitation
- Self-contained files: No external dependencies or infrastructure required
- Crash recovery: Corrupted frames are localized, not catastrophic
- Streaming support: Start processing data before the full file loads
From Video to Memory
So we tried something unconventional. We stored embeddings inside video frames. Each interaction becomes a frame. Each applicant update, requirement change, or decision is a frame. String them together, index them properly, and you get a “video” of operational memory that an AI system can query with exact, deterministic recall. That insight evolved into Memvid. We shipped it in production. It worked. And we quickly realized this wasn’t just a healthcare problem, every serious AI application faces the same challenge. So we open-sourced the solution.Why Video Frames as Storage Units?
Traditional systems treat documents as isolated objects. Memvid treats information as frames in a continuous, evolving knowledge stream. That difference changes everything.The Problem with Document-Centric Storage
When documents are stored as separate objects:- No inherent ordering: When was doc 3 added relative to doc 1?
- No context continuity: What was the state of knowledge at time T?
- Fragmented storage: Metadata, vectors, and content in different places
- Sync complexity: Keeping everything consistent is error-prone
The Frame Solution
Frames provide:| Benefit | How Frames Deliver It |
|---|---|
| Temporal ordering | Every frame has a position in the sequence |
| Point-in-time queries | ”What did we know at frame 100?” |
| Atomic units | Each frame is self-contained with all metadata |
| Efficient deltas | Similar consecutive frames compress well |
| Single-file portability | Everything serializes to one .mv2 file |
How It Works:
Here’s exactly how Memvid processes and stores your content. No black boxes.Step 1: Frame Creation
When you callput(), Memvid creates a frame structure:
Header
Header
- Frame ID:
42 - Timestamp:
1704067200 - URI:
mv2://docs/meeting-notes - Checksum:
sha256:a1b2c3...
Metadata
Metadata
- Title: “Q4 Meeting Notes”
- Labels:
["meeting", "q4"] - Track: “notes”
- Custom:
{ author: "alice" }
Payload
Payload
zstd-compressed content bytes
Embeddings (optional)
Embeddings (optional)
384-dim vector, quantized to int8 - only if semantic search is needed
Step 2: Index Updates
After frame creation, multiple indexes are updated atomically:Step 3: WAL Commit
Before returning success, the frame is committed to the Write-Ahead Log: If the process crashes mid-write, the WAL ensures:- Committed frames are recovered on next open
- Incomplete frames are discarded cleanly
- No corruption propagates to existing data
Step 4: Segment Compaction
Periodically, frames are grouped into segments for storage efficiency:Traditional VectorDB: How They Store Context
To understand why frames matter, let’s see how traditional vector databases handle the same data:The Traditional VectorDB Architecture
Problems with This Approach
| Issue | Traditional VectorDB | Memvid Frames |
|---|---|---|
| Storage fragmentation | Vectors in one place, metadata in another, raw docs in a third | Everything in one frame, one file |
| Temporal amnesia | No concept of “when” something was added | Every frame has a timestamp and position |
| Point-in-time queries | Impossible or requires complex versioning | Built-in: as_of_frame=100 |
| Consistency | Distributed transactions across systems | Single-file atomic writes |
| Portability | Export/import across multiple systems | Copy one .mv2 file |
| Offline operation | Requires API access for embeddings | Local embeddings, fully offline |
| Crash recovery | Hope your 3 systems are all consistent | WAL ensures atomic recovery |
What Traditional VectorDBs Actually Store
When you insert a document into Pinecone, Weaviate, or ChromaDB:- Original content? Often discarded or stored separately
- Temporal context? Not tracked
- Relationship to other docs? Only through vector similarity
- History? Non-existent
What Memvid Frames Store
Performance Benefits of Frame Architecture
The frame architecture isn’t just conceptually cleaner. It’s faster.Why Frames Are Fast
1. Locality of Reference
Traditional systems scatter data across storage layers. Frames keep related data together:Traditional: 3 Round Trips
- Query vector index (network)
- Fetch metadata (different server)
- Retrieve document (blob storage)
Memvid: 1 Seek
- Seek to frame offset in .mv2 file (all data co-located)
2. Segment-Based Caching
Frames group into segments that cache efficiently:1
Time Index Lookup
Time index identifies Segment 3 contains Q4 frames
2
Single Read
Load Segment 3 into memory (one I/O operation)
3
Cache Ready
All Q4 frames now cached, subsequent queries are instant
3. Compression Efficiency
Similar frames compress dramatically when stored together:| Content Type | Raw Size | Frame-Compressed | Savings |
|---|---|---|---|
| Chat history | 10 MB | 0.8 MB | 92% |
| Documentation | 50 MB | 4.2 MB | 91% |
| Mixed content | 100 MB | 12 MB | 88% |
- Sequential frames often share vocabulary (Zstd dictionary)
- Embeddings quantize to int8 (75% vector size reduction)
- Metadata schemas are consistent within segments
4. Index Co-location
All indexes live in the same file, enabling compound queries without joins:Benchmark: Frame vs Traditional
Real-world comparison on 1M document corpus:| Operation | Pinecone | ChromaDB | Memvid |
|---|---|---|---|
| Insert 1K docs | 2.3s | 4.1s | 0.8s |
| Hybrid search | N/A | N/A | 8ms |
| Point-in-time query | N/A | N/A | 9ms |
| Export all data | 45min | 12min | 0.1s (copy file) |
| Cold start | 3.2s | 1.8s | 0.05s |
| Storage size | 2.1 GB | 1.8 GB | 0.4 GB |
Why so fast? Memvid doesn’t need network calls, distributed coordination, or multi-system consistency. It’s just reading from a well-organized file.
Frame Lifecycle
Creation
When you add content, Memvid:- Generates a unique frame ID
- Extracts and indexes text content
- Computes embeddings (if enabled)
- Records timestamp in the time index
- Appends to the WAL for crash safety
- Assigns a URI (
mv2://track/title)
Retrieval
When you search or view:- Query hits the appropriate index (lexical, vector, or time)
- Frame metadata is loaded from the TOC
- Payload is decompressed and returned
- Access is logged for analytics
Deletion
“Deleted” frames aren’t physically removed. They’re tombstoned:Frame IDs vs URIs
Every frame has two identifiers:| Identifier | Format | Example | Use Case |
|---|---|---|---|
| Frame ID | Integer | 124 | Internal reference |
| URI | String | mv2://docs/api.md | Human-readable path |
Best Practices
Frame Sizing
- Small frames (under 4KB): Great for chat messages, notes
- Medium frames (4KB - 1MB): Documents, articles
- Large frames (over 1MB): PDFs, images, audio
Batch Ingestion
Useput_many() for bulk ingestion (100-200x faster):