Frame Architecture

One of Memvid’s core innovations is the Smart Frame, a storage primitive inspired by how video files encode information. Just as videos are composed of sequential frames that can be played, randomly seeked, or edited, without rewriting the entire file, Memvid represents AI data as an append-only sequence of semantic frames. Each frame captures meaning at a point in time, enabling efficient retrieval, temporal navigation, and incremental growth without destructive updates.

The Early Insight

Memvid was born from a real internal problem. While our team was building agentic systems for the healthcare industry, we ran into a foundational challenge: memory. We were responsible for building AI agents that could screen applicants and adapt to the unique, high-stakes requirements of healthcare staffing, reasoning over long histories of candidates, roles, facility requirements, and constantly changing constraints. The dataset wasn’t just large, it was mission-critical and evolving fast.

The Problem We Faced

For an AI agent to be useful in real-world staffing workflows, it needed to reliably remember people, conversations, decisions, and constraints over long periods of time. When someone asked:

“What roles has this candidate applied for in the past six months?”
“Have they worked night shifts before?”
“What requirements did this facility specify last week?”

The answer had to be exact. Not a summary. Not a best guess. Not a hallucination. That requirement exposed hard limits in existing AI memory systems. No matter which approach we tried, we ran into the same failures:

Approach	Problem
Feed everything to the LLM	Large language models have strict context limits. Real-world histories quickly exceeded those limits, making full recall impossible.
Fine-tune/pretrain a model	Slow, expensive, and brittle. Data changed constantly, and retraining for every update simply didn’t scale.
Traditional RAG with VectorDB	Vector search retrieves similar information, not exact information. Critical details were lost to semantic approximation.
Chunking strategies	Chunking fractured context. Ordering, dependencies, and timelines were easily broken, often in subtle, dangerous ways.

On top of that, the data itself was extremely sensitive. We needed memory that could run fully on-prem, work offline, remain portable across devices, and avoid centralized infrastructure entirely. Traditional RAG pipelines weren’t just complex and expensive, they were security liabilities. None of the existing approaches met the bar.

What We Actually Needed

We needed a system that could:

Store unbounded, growing histories
Recall exact information, not semantic guesses
Support real-time writes as new data arrived
Run fully offline and on-prem
Be portable and self-contained
Minimize attack surface and infrastructure complexity

So we stepped back and asked a different question.

The Video Insight

What existing technology already handles massive, sequential data with random access, efficient compression, and decades of battle-tested reliability?

The answer was video. A two-hour film contains millions of frames, yet you can jump to any moment instantly. The file is self-contained: no database, no server, no external dependencies. Corrupted frames don’t invalidate the entire file. And decades of optimization have made video codecs extraordinarily efficient. Real-world AI memory has the same shape. Information accumulates over time. Events are sequential. You need to jump to specific moments while preserving the full historical timeline. Memory must be incrementally writable, crash-safe, and portable across machines. Video codecs have spent 40+ years solving exactly these problems:

Sequential data with random access: jump to any frame instantly
Efficient compression: 100x compression ratios via redundancy exploitation
Self-contained files: No external dependencies or infrastructure required
Crash recovery: Corrupted frames are localized, not catastrophic
Streaming support: Start processing data before the full file loads

From Video to Memory

So we tried something unconventional: Storing embeddings inside frames. Each interaction becomes a frame. Each applicant update, requirement change, or decision is a frame. String them together, index them properly, and you get operational memory that an AI system can query with exact, deterministic recall. That insight evolved into Memvid. We shipped it in production. It worked. And we quickly realized this wasn’t just a healthcare problem, every serious AI application faces the same challenge. So we open-sourced the solution.

Why Smart Frames as Storage Units?

Traditional systems treat documents as isolated objects. Memvid treats information as frames in a continuous, evolving knowledge stream. That difference changes everything.

The Problem with Document-Centric Storage

When documents are stored as separate objects:

No inherent ordering: When was doc 3 added relative to doc 1?
No context continuity: What was the state of knowledge at time T?
Fragmented storage: Metadata, vectors, and content in different places
Sync complexity: Keeping everything consistent is error-prone

The Frame Solution

Frames provide:

Benefit	How Frames Deliver It
Temporal ordering	Every frame has a position in the sequence
Point-in-time queries	”What did we know at frame 100?”
Atomic units	Each frame is self-contained with all metadata
Efficient deltas	Similar consecutive frames compress well
Single-file portability	Everything serializes to one `.mv2` file

How It Works:

Here’s exactly how Memvid processes and stores your content. No black boxes.

Step 1: Frame Creation

When you call put(), Memvid creates a frame structure:

Header

Frame ID: 42
Timestamp: 1704067200
URI: mv2://docs/meeting-notes
Checksum: sha256:a1b2c3...

Metadata

Title: “Q4 Meeting Notes”
Labels: ["meeting", "q4"]
Track: “notes”
Custom: { author: "alice" }

Payload

zstd-compressed content bytes

Embeddings (optional)

384-dim vector, quantized to int8 - only if semantic search is needed

Step 2: Index Updates

After frame creation, multiple indexes are updated atomically:

Step 3: WAL Commit

Before returning success, the frame is committed to the Write-Ahead Log: If the process crashes mid-write, the WAL ensures:

Committed frames are recovered on next open
Incomplete frames are discarded cleanly
No corruption propagates to existing data

Step 4: Segment Compaction

Periodically, frames are grouped into segments for storage efficiency:

Traditional VectorDB: How They Store Context

To understand why frames matter, let’s see how traditional vector databases handle the same data:

The Traditional VectorDB Architecture

Problems with This Approach

Issue	Traditional VectorDB	Memvid Frames
Storage fragmentation	Vectors in one place, metadata in another, raw docs in a third	Everything in one frame, one file
Temporal amnesia	No concept of “when” something was added	Every frame has a timestamp and position
Point-in-time queries	Impossible or requires complex versioning	Built-in: `as_of_frame=100`
Consistency	Distributed transactions across systems	Single-file atomic writes
Portability	Export/import across multiple systems	Copy one `.mv2` file
Offline operation	Requires API access for embeddings	Local embeddings, fully offline
Crash recovery	Hope your 3 systems are all consistent	WAL ensures atomic recovery

What Traditional VectorDBs Actually Store

When you insert a document into Pinecone, Weaviate, or ChromaDB:

# Traditional VectorDB
vectordb.insert(
    id="doc-123",
    vector=[0.1, 0.2, ...],  # 1536 floats
    metadata={"title": "Meeting Notes"}
)
# Where's the original document?
# When was it added?
# What was the knowledge state before this?
# 🤷

The vector is stored. Maybe some metadata. But:

Original content? Often discarded or stored separately
Temporal context? Not tracked
Relationship to other docs? Only through vector similarity
History? Non-existent

What Memvid Frames Store

# Memvid
mem.put(
    title="Meeting Notes",
    label="meeting",
    metadata={},
    text="Full document content here..."
)
# Stored atomically:
# ✓ Full original content (compressed)
# ✓ All metadata
# ✓ Timestamp + frame position
# ✓ Relationship to previous frames
# ✓ Crash-safe commit
# ✓ Embedding vector (optional - add when you need semantic search)

Performance Benefits of Frame Architecture

The frame architecture isn’t just conceptually cleaner. It’s faster.

Why Frames Are Fast

1. Locality of Reference

Traditional systems scatter data across storage layers. Frames keep related data together:

Traditional: 3 Round Trips

Query vector index (network)
Fetch metadata (different server)
Retrieve document (blob storage)

Memvid: 1 Seek

Seek to frame offset in .mv2 file (all data co-located)

2. Segment-Based Caching

Frames group into segments that cache efficiently:

Time Index Lookup

Time index identifies Segment 3 contains Q4 frames

Single Read

Load Segment 3 into memory (one I/O operation)

Cache Ready

All Q4 frames now cached, subsequent queries are instant

3. Compression Efficiency

Similar frames compress dramatically when stored together:

Content Type	Raw Size	Frame-Compressed	Savings
Chat history	10 MB	0.8 MB	92%
Documentation	50 MB	4.2 MB	91%
Mixed content	100 MB	12 MB	88%

This happens because:

Sequential frames often share vocabulary (Zstd dictionary)
Embeddings quantize to int8 (75% vector size reduction)
Metadata schemas are consistent within segments

4. Index Co-location

All indexes live in the same file, enabling compound queries without joins:

-- Conceptual query (not actual syntax)
SELECT frames
WHERE text MATCH 'budget'           -- Lexical index
  AND vector SIMILAR TO query_vec   -- Vector index
  AND timestamp > '2024-01-01'      -- Time index

-- All indexes in one file = one I/O operation

Benchmark: Frame vs Traditional

Real-world comparison on 1M document corpus:

Operation	Pinecone	ChromaDB	Memvid
Insert 1K docs	2.3s	4.1s	0.8s
Hybrid search	N/A	N/A	8ms
Point-in-time query	N/A	N/A	9ms
Export all data	45min	12min	0.1s (copy file)
Cold start	3.2s	1.8s	0.05s
Storage size	2.1 GB	1.8 GB	0.4 GB

Why so fast? Memvid doesn’t need network calls, distributed coordination, or multi-system consistency. It’s just reading from a well-organized file.

Frame Lifecycle

Creation

When you add content, Memvid:

Generates a unique frame ID
Extracts and indexes text content
Computes embeddings (if enabled)
Records timestamp in the time index
Appends to the WAL for crash safety
Assigns a URI (mv2://track/title)

Retrieval

When you search or view:

Query hits the appropriate index (lexical, vector, or time)
Frame metadata is loaded from the TOC
Payload is decompressed and returned
Access is logged for analytics

Deletion

“Deleted” frames aren’t physically removed. They’re tombstoned:

# Mark frame as deleted
mem.delete(frame_id=42)

# Frame still exists but won't appear in searches
# Use vacuum to physically reclaim space

memvid doctor knowledge.mv2 --vacuum

Frame IDs vs URIs

Every frame has two identifiers:

Identifier	Format	Example	Use Case
Frame ID	Integer	`124`	Internal reference
URI	String	`mv2://docs/api.md`	Human-readable path

# Access by frame ID
frame = mem.frame(124)

# Access by URI
frame = mem.frame('mv2://docs/api.md')

Best Practices

Frame Sizing

Small frames (under 4KB): Great for chat messages, notes
Medium frames (4KB - 1MB): Documents, articles
Large frames (over 1MB): PDFs, images, audio

Batch Ingestion

Use put_many() for bulk ingestion (100-200x faster):

docs = [
    {'text': 'Content 1', 'title': 'Doc 1', 'label': 'docs'},
    {'text': 'Content 2', 'title': 'Doc 2', 'label': 'docs'},
    # ... thousands more
]

mem.put_many(docs)

Tombstone Management

Periodically vacuum to reclaim space:

# Check how much space can be reclaimed
memvid stats knowledge.mv2

# Reclaim deleted frame space
memvid doctor knowledge.mv2 --vacuum

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​The Early Insight

​The Problem We Faced

​What We Actually Needed

​The Video Insight

​From Video to Memory

​Why Smart Frames as Storage Units?

​The Problem with Document-Centric Storage

​The Frame Solution

​How It Works:

​Step 1: Frame Creation

​Step 2: Index Updates

​Step 3: WAL Commit

​Step 4: Segment Compaction

​Traditional VectorDB: How They Store Context

​The Traditional VectorDB Architecture

​Problems with This Approach

​What Traditional VectorDBs Actually Store

​What Memvid Frames Store

​Performance Benefits of Frame Architecture

​Why Frames Are Fast

​1. Locality of Reference

Traditional: 3 Round Trips

Memvid: 1 Seek

​2. Segment-Based Caching

​3. Compression Efficiency

​4. Index Co-location

​Benchmark: Frame vs Traditional

​Frame Lifecycle

​Creation

​Retrieval

​Deletion

​Frame IDs vs URIs

​Best Practices

​Frame Sizing

​Batch Ingestion

​Tombstone Management

​Next Steps

Memory Architecture

Time Index

The Early Insight

The Problem We Faced

What We Actually Needed

The Video Insight

From Video to Memory

Why Smart Frames as Storage Units?

The Problem with Document-Centric Storage

The Frame Solution

How It Works:

Step 1: Frame Creation

Step 2: Index Updates

Step 3: WAL Commit

Step 4: Segment Compaction

Traditional VectorDB: How They Store Context

The Traditional VectorDB Architecture

Problems with This Approach

What Traditional VectorDBs Actually Store

What Memvid Frames Store

Performance Benefits of Frame Architecture

Why Frames Are Fast

1. Locality of Reference

2. Segment-Based Caching

3. Compression Efficiency

4. Index Co-location

Benchmark: Frame vs Traditional

Frame Lifecycle

Creation

Retrieval

Deletion

Frame IDs vs URIs

Best Practices

Frame Sizing

Batch Ingestion

Tombstone Management

Next Steps