Memory Architecture

Understanding how Memvid stores data helps you make better decisions about ingestion, search, and performance optimization.

File Structure

A .mv2 file is a single, self-contained binary with five main layers:

1. Header (4 KB)

The header contains:

Magic bytes: Identifies the file as .mv2 format
Version: File format version
WAL metadata: Position and size of write-ahead log
Footer offset: Points to the table of contents

2. Embedded WAL

The write-ahead log (WAL) ensures crash safety:

All mutations are written to WAL first
On recovery, uncommitted changes are replayed
Size scales with file capacity (1 MB to 64 MB)

3. Segments (Frames)

Your actual data lives in segments, which contain frames (the fundamental unit of storage):

Text segments: Document content and metadata stored as frames
Blob segments: Binary data (images, PDFs) as frames
Media segments: Audio and video content as frames
Vector segments: Embeddings for semantic search (optional)

Each frame contains payload, metadata, timestamp, URI, and checksum. Segments are written in deterministic order for reproducibility.

4. Indices

Memvid maintains multiple indices for fast search:

Lexical index (BM25): Full-text keyword search - works out of the box
Time index: Temporal ordering of frames
Vector index: Semantic similarity search - optional, add when needed

5. Table of Contents + Footer

The TOC maps everything:

Segment locations and sizes
Index offsets
Checksums for integrity verification

The footer contains a final checksum and magic trailer (MV2FOOT!).

Data Lifecycle

Writing Data

When you add documents:

put() - Adds frames (documents) to pending state
Indices updated - Lexical and vector indices are built
Time entries queued - Timestamps recorded for timeline
WAL appended - Transaction logged for crash safety
seal() - Commits everything to disk with checksums

from memvid_sdk import use

mem = use('basic', 'knowledge.mv2')

# 1. Add documents (pending)
mem.put(title="Doc 1", label="docs", metadata={}, text="Your content")
mem.put(title="Report", label="docs", metadata={}, file="report.pdf")

# 2. Commit to disk
mem.seal()

Reading Data

When you search or retrieve:

Open file - Locate latest valid footer
Load TOC - Map segments and indices
Replay WAL - Apply any uncommitted changes
Query indices - Search lexical/vector/time indices
Return results - Ranked documents with snippets

Single-File Guarantee

Memvid’s core promise is single-file portability:

What It Means

No sidecar files: No .wal, .lock, .shm files
No external state: Everything is in the .mv2 file
Portable: Copy the file to transfer the entire memory

Why It Matters

# Your entire knowledge base
ls ~/project/
# → knowledge.mv2

# Share it anywhere
cp knowledge.mv2 /team/shared/
scp knowledge.mv2 user@server:/data/
git add knowledge.mv2

How It Works

Traditional databases use separate files for journals, locks, and indices. Memvid embeds all of these inside the .mv2 file:

Traditional DB	Memvid
data.db + data.db-wal + data.db-shm	knowledge.mv2
Requires careful copying	Just copy the file

Crash Safety

The embedded WAL ensures data survives unexpected shutdowns.

Write-Ahead Logging

Every mutation is logged before being applied:

Transaction written to WAL region
WAL synced to disk (fsync)
Changes applied to segments
Checksum updated

Recovery Process

On open, Memvid:

Locates the last valid footer
Loads the table of contents
Scans WAL for uncommitted entries
Replays any pending transactions

This guarantees that your data is safe even after crashes or power failures.

WAL Sizing

WAL size scales with file capacity:

File Size	WAL Size
Under 100 MB	1 MB
Under 1 GB	4 MB
Under 10 GB	16 MB
10 GB or more	64 MB

Locking and Concurrency

File Locking

Memvid uses OS-level file locks:

Shared locks: Multiple readers allowed
Exclusive locks: Single writer at a time

Read-Only Mode

For concurrent read access:

# Multiple processes can read simultaneously
mem = use('basic', 'knowledge.mv2', read_only=True)
results = mem.find('query')

Writer Conflicts

If a writer holds the lock:

from memvid_sdk import use, LockedError

try:
    mem = use('basic', 'knowledge.mv2')
except LockedError:
    print("File is locked by another process")

Determinism

Given the same inputs, Memvid produces the same outputs.

Why Determinism Matters

Reproducible builds: Same data → same file
Reliable testing: Predictable behavior
Easy debugging: Consistent results

How It’s Achieved

Segments written in deterministic order
Timestamps explicit, not system-derived
Checksums verify integrity

Performance Considerations

Memory Usage

Memvid keeps some data in memory:

Table of contents
WAL handle
Pending time entries

For large files, consider:

Closing handles when done
Using read-only mode for queries

Index Building

Building indices is CPU-intensive:

Lexical index: BM25 tokenization and indexing
Vector index: Graph construction for similarity search

Use parallel ingestion for large datasets:

memvid put knowledge.mv2 --input ./large-dataset/ \
  --vector-compression \
  --parallel-segments

Search Optimization

Lexical search: Fast for exact keywords
Vector search: Slower but more intelligent
Hybrid search: Balances both

Choose the right mode for your query.

Next Steps

Indices & Tracks

Learn about lexical, vector, and time indices

Storage Capacity

Understand storage tiers and capacity management

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​File Structure

​1. Header (4 KB)

​2. Embedded WAL

​3. Segments (Frames)

​4. Indices

​5. Table of Contents + Footer

​Data Lifecycle

​Writing Data

​Reading Data

​Single-File Guarantee

​What It Means

​Why It Matters

​How It Works

​Crash Safety

​Write-Ahead Logging

​Recovery Process

​WAL Sizing

​Locking and Concurrency

​File Locking

​Read-Only Mode

​Writer Conflicts

​Determinism

​Why Determinism Matters

​How It’s Achieved

​Performance Considerations

​Memory Usage

​Index Building

​Search Optimization

​Next Steps