Skip to main content
One of Memvid’s core innovations is the Smart Frame, a storage primitive inspired by how video files encode information. Just as videos are composed of sequential frames that can be played, randomly seeked, or edited, without rewriting the entire file, Memvid represents AI data as an append-only sequence of semantic frames. Each frame captures meaning at a point in time, enabling efficient retrieval, temporal navigation, and incremental growth without destructive updates.

The Origin Story: How We Landed on Video

Memvid was born from a real internal problem. While our team was building agentic systems for the healthcare industry, we ran into a foundational challenge: memory. We were responsible for building AI agents that could screen applicants and adapt to the unique, high-stakes requirements of healthcare staffing, reasoning over long histories of candidates, roles, facility requirements, and constantly changing constraints. The dataset wasn’t just large, it was mission-critical and evolving fast.

The Problem We Faced

For an AI agent to be useful in real-world staffing workflows, it needed to reliably remember people, conversations, decisions, and constraints over long periods of time. When someone asked:
  • “What roles has this candidate applied for in the past six months?”
  • “Have they worked night shifts before?”
  • “What requirements did this facility specify last week?”
The answer had to be exact. Not a summary. Not a best guess. Not a hallucination. That requirement exposed hard limits in existing AI memory systems. No matter which approach we tried, we ran into the same failures:
ApproachProblem
Feed everything to the LLMLarge language models have strict context limits. Real-world histories quickly exceeded those limits, making full recall impossible.
Fine-tune/pretrain a modelSlow, expensive, and brittle. Data changed constantly, and retraining for every update simply didn’t scale.
Traditional RAG with VectorDBVector search retrieves similar information, not exact information. Critical details were lost to semantic approximation.
Chunking strategiesChunking fractured context. Ordering, dependencies, and timelines were easily broken, often in subtle, dangerous ways.
On top of that, the data itself was extremely sensitive. We needed memory that could run fully on-prem, work offline, remain portable across devices, and avoid centralized infrastructure entirely. Traditional RAG pipelines weren’t just complex and expensive, they were security liabilities. None of the existing approaches met the bar.

What We Actually Needed

We needed a system that could:
  • Store unbounded, growing histories
  • Recall exact information, not semantic guesses
  • Support real-time writes as new data arrived
  • Run fully offline and on-prem
  • Be portable and self-contained
  • Minimize attack surface and infrastructure complexity
So we stepped back and asked a different question.

The Video Insight

What existing technology already handles massive, sequential data with random access, efficient compression, and decades of battle-tested reliability?
The answer was video. A two-hour film contains millions of frames, yet you can jump to any moment instantly. The file is self-contained: no database, no server, no external dependencies. Corrupted frames don’t invalidate the entire file. And decades of optimization have made video codecs extraordinarily efficient. Real-world AI memory has the same shape. Information accumulates over time. Events are sequential. You need to jump to specific moments while preserving the full historical timeline. Memory must be incrementally writable, crash-safe, and portable across machines. Video codecs have spent 40+ years solving exactly these problems:
  • Sequential data with random access: jump to any frame instantly
  • Efficient compression: 100x compression ratios via redundancy exploitation
  • Self-contained files: No external dependencies or infrastructure required
  • Crash recovery: Corrupted frames are localized, not catastrophic
  • Streaming support: Start processing data before the full file loads

From Video to Memory

So we tried something unconventional. We stored embeddings inside video frames. Each interaction becomes a frame. Each applicant update, requirement change, or decision is a frame. String them together, index them properly, and you get a “video” of operational memory that an AI system can query with exact, deterministic recall. That insight evolved into Memvid. We shipped it in production. It worked. And we quickly realized this wasn’t just a healthcare problem, every serious AI application faces the same challenge. So we open-sourced the solution.

Why Video Frames as Storage Units?

Traditional systems treat documents as isolated objects. Memvid treats information as frames in a continuous, evolving knowledge stream. That difference changes everything.

The Problem with Document-Centric Storage

When documents are stored as separate objects:
  • No inherent ordering: When was doc 3 added relative to doc 1?
  • No context continuity: What was the state of knowledge at time T?
  • Fragmented storage: Metadata, vectors, and content in different places
  • Sync complexity: Keeping everything consistent is error-prone

The Frame Solution

Frames provide:
BenefitHow Frames Deliver It
Temporal orderingEvery frame has a position in the sequence
Point-in-time queries”What did we know at frame 100?”
Atomic unitsEach frame is self-contained with all metadata
Efficient deltasSimilar consecutive frames compress well
Single-file portabilityEverything serializes to one .mv2 file

How It Works:

Here’s exactly how Memvid processes and stores your content. No black boxes.

Step 1: Frame Creation

When you call put(), Memvid creates a frame structure:

Header

  • Frame ID: 42
  • Timestamp: 1704067200
  • URI: mv2://docs/meeting-notes
  • Checksum: sha256:a1b2c3...

Metadata

  • Title: “Q4 Meeting Notes”
  • Labels: ["meeting", "q4"]
  • Track: “notes”
  • Custom: { author: "alice" }

Payload

zstd-compressed content bytes

Embeddings (optional)

384-dim vector, quantized to int8 - only if semantic search is needed

Step 2: Index Updates

After frame creation, multiple indexes are updated atomically:

Step 3: WAL Commit

Before returning success, the frame is committed to the Write-Ahead Log: If the process crashes mid-write, the WAL ensures:
  • Committed frames are recovered on next open
  • Incomplete frames are discarded cleanly
  • No corruption propagates to existing data

Step 4: Segment Compaction

Periodically, frames are grouped into segments for storage efficiency:

Traditional VectorDB: How They Store Context

To understand why frames matter, let’s see how traditional vector databases handle the same data:

The Traditional VectorDB Architecture

Problems with This Approach

IssueTraditional VectorDBMemvid Frames
Storage fragmentationVectors in one place, metadata in another, raw docs in a thirdEverything in one frame, one file
Temporal amnesiaNo concept of “when” something was addedEvery frame has a timestamp and position
Point-in-time queriesImpossible or requires complex versioningBuilt-in: as_of_frame=100
ConsistencyDistributed transactions across systemsSingle-file atomic writes
PortabilityExport/import across multiple systemsCopy one .mv2 file
Offline operationRequires API access for embeddingsLocal embeddings, fully offline
Crash recoveryHope your 3 systems are all consistentWAL ensures atomic recovery

What Traditional VectorDBs Actually Store

When you insert a document into Pinecone, Weaviate, or ChromaDB:
# Traditional VectorDB
vectordb.insert(
    id="doc-123",
    vector=[0.1, 0.2, ...],  # 1536 floats
    metadata={"title": "Meeting Notes"}
)
# Where's the original document?
# When was it added?
# What was the knowledge state before this?
# 🤷
The vector is stored. Maybe some metadata. But:
  • Original content? Often discarded or stored separately
  • Temporal context? Not tracked
  • Relationship to other docs? Only through vector similarity
  • History? Non-existent

What Memvid Frames Store

# Memvid
mem.put(
    title="Meeting Notes",
    label="meeting",
    metadata={},
    text="Full document content here..."
)
# Stored atomically:
# ✓ Full original content (compressed)
# ✓ All metadata
# ✓ Timestamp + frame position
# ✓ Relationship to previous frames
# ✓ Crash-safe commit
# ✓ Embedding vector (optional - add when you need semantic search)

Performance Benefits of Frame Architecture

The frame architecture isn’t just conceptually cleaner. It’s faster.

Why Frames Are Fast

1. Locality of Reference

Traditional systems scatter data across storage layers. Frames keep related data together:

Traditional: 3 Round Trips

  1. Query vector index (network)
  2. Fetch metadata (different server)
  3. Retrieve document (blob storage)

Memvid: 1 Seek

  1. Seek to frame offset in .mv2 file (all data co-located)

2. Segment-Based Caching

Frames group into segments that cache efficiently:
1

Time Index Lookup

Time index identifies Segment 3 contains Q4 frames
2

Single Read

Load Segment 3 into memory (one I/O operation)
3

Cache Ready

All Q4 frames now cached, subsequent queries are instant

3. Compression Efficiency

Similar frames compress dramatically when stored together:
Content TypeRaw SizeFrame-CompressedSavings
Chat history10 MB0.8 MB92%
Documentation50 MB4.2 MB91%
Mixed content100 MB12 MB88%
This happens because:
  • Sequential frames often share vocabulary (Zstd dictionary)
  • Embeddings quantize to int8 (75% vector size reduction)
  • Metadata schemas are consistent within segments

4. Index Co-location

All indexes live in the same file, enabling compound queries without joins:
-- Conceptual query (not actual syntax)
SELECT frames
WHERE text MATCH 'budget'           -- Lexical index
  AND vector SIMILAR TO query_vec   -- Vector index
  AND timestamp > '2024-01-01'      -- Time index

-- All indexes in one file = one I/O operation

Benchmark: Frame vs Traditional

Real-world comparison on 1M document corpus:
OperationPineconeChromaDBMemvid
Insert 1K docs2.3s4.1s0.8s
Hybrid searchN/AN/A8ms
Point-in-time queryN/AN/A9ms
Export all data45min12min0.1s (copy file)
Cold start3.2s1.8s0.05s
Storage size2.1 GB1.8 GB0.4 GB
Why so fast? Memvid doesn’t need network calls, distributed coordination, or multi-system consistency. It’s just reading from a well-organized file.

Frame Lifecycle

Creation

When you add content, Memvid:
  1. Generates a unique frame ID
  2. Extracts and indexes text content
  3. Computes embeddings (if enabled)
  4. Records timestamp in the time index
  5. Appends to the WAL for crash safety
  6. Assigns a URI (mv2://track/title)

Retrieval

When you search or view:
  1. Query hits the appropriate index (lexical, vector, or time)
  2. Frame metadata is loaded from the TOC
  3. Payload is decompressed and returned
  4. Access is logged for analytics

Deletion

“Deleted” frames aren’t physically removed. They’re tombstoned:
# Mark frame as deleted
mem.delete(frame_id=42)

# Frame still exists but won't appear in searches
# Use vacuum to physically reclaim space
memvid doctor knowledge.mv2 --vacuum

Frame IDs vs URIs

Every frame has two identifiers:
IdentifierFormatExampleUse Case
Frame IDInteger124Internal reference
URIStringmv2://docs/api.mdHuman-readable path
# Access by frame ID
frame = mem.frame(124)

# Access by URI
frame = mem.frame('mv2://docs/api.md')

Best Practices

Frame Sizing

  • Small frames (under 4KB): Great for chat messages, notes
  • Medium frames (4KB - 1MB): Documents, articles
  • Large frames (over 1MB): PDFs, images, audio

Batch Ingestion

Use put_many() for bulk ingestion (100-200x faster):
docs = [
    {'text': 'Content 1', 'title': 'Doc 1', 'label': 'docs'},
    {'text': 'Content 2', 'title': 'Doc 2', 'label': 'docs'},
    # ... thousands more
]

mem.put_many(docs)

Tombstone Management

Periodically vacuum to reclaim space:
# Check how much space can be reclaimed
memvid stats knowledge.mv2

# Reclaim deleted frame space
memvid doctor knowledge.mv2 --vacuum

Next Steps