Architecture Overview

Memvid is designed around a simple but powerful principle: everything in one file. This page explains the architecture that makes this possible.

Core Design Principles

1. Single-File Guarantee

Every .mv2 file is completely self-contained:

No sidecars - Never creates .wal, .shm, .lock, or journal files
Fully portable - Copy, move, or share the file freely
No database - No external services required

2. Crash Safety

The embedded Write-Ahead Log (WAL) ensures data integrity:

Writes go to WAL first, then to permanent storage
Automatic recovery on file open after crashes
Recovery completes in under 250ms even for large files

3. Determinism

Same inputs produce identical bytes on the same platform:

Reproducible builds for testing and QA
Verifiable file integrity with checksums
Predictable behavior across runs

4. Performance

Optimized for fast search and retrieval:

Search latency: ~5ms for 50K documents
Cold start: under 200ms
WAL append: under 0.1ms per write

File Layout

The .mv2 file format has a well-defined structure: The 4 KB header contains:

Field	Description
Magic	`MV2` identifier
Version	File format version
WAL Offset	Start of embedded WAL region
WAL Size	Size of WAL ring buffer
Checkpoint Position	Last committed WAL position
TOC Checksum	BLAKE3 hash for integrity

Embedded WAL

The WAL is sized based on total file capacity:

File Size	WAL Size
Under 100 MB	1 MB
Under 1 GB	4 MB
Under 10 GB	16 MB
10 GB or more	64 MB

Checkpoint triggers:

WAL reaches 75% capacity
User calls seal()
Every 1,000 transactions
Clean shutdown

Frames

Frames are the fundamental unit of storage. Each frame contains:

Payload - The actual content (text, binary, media)
Metadata - Title, URI, timestamps, tags, labels
Checksum - BLAKE3 hash for verification
Encoding - Plain or Zstd compressed

Search Architecture

Memvid supports three search modes:

Lexical Search (BM25)

Fast keyword search using BM25 ranking:

Full-text search with term frequency scoring
Date range filters: date:[2024-01-01 TO 2024-12-31]
Tokenization and stemming

Vector Search

Semantic similarity search using embeddings:

Fast approximate nearest neighbor search
Optional Product Quantization (PQ) for 16x compression
Configurable embedding models

Hybrid Search

Combines both approaches:

Run lexical search for keyword matches
Run vector search for semantic similarity
Merge and rerank results
Return top-k hits

Developer Walkthrough

Here’s how to work with Memvid in practice:

Using the CLI

# Create a new memory
memvid create notes.mv2

# Add documents
memvid put notes.mv2 --input ./docs/ --vector-compression

# Search
memvid find notes.mv2 --query "machine learning" --mode auto

# Ask questions
memvid ask notes.mv2 --question "What are the key points?"

# View timeline
memvid timeline notes.mv2 --limit 10

# Check health
memvid doctor notes.mv2 --plan-only

Using the Python SDK

from memvid_sdk import use

# Open or create
mem = use('basic', 'notes.mv2')

# Add content
mem.put(text="Introduction to neural networks...", title="NN Intro")

# Batch add (100-200x faster)
mem.put_many([
    {'text': 'Chapter 1...', 'title': 'Ch 1'},
    {'text': 'Chapter 2...', 'title': 'Ch 2'},
])

# Search
results = mem.find('neural networks', k=5)

# Ask with LLM
answer = mem.ask('What is a neural network?', model='openai:gpt-4o')

# Close properly
mem.seal()

Using the Node.js SDK

import { use } from '@memvid/sdk';

// Open or create
const mem = await use('basic', 'notes.mv2');

// Add content
await mem.put({ text: 'Introduction to neural networks...', title: 'NN Intro', label: 'intro' });

// Search
const results = await mem.find('neural networks', { k: 5 });

// Ask with LLM
const answer = await mem.ask('What is a neural network?', {
  model: 'openai:gpt-4o',
  modelApiKey: process.env.OPENAI_API_KEY
});

// Close properly
await mem.seal();

Verification and Repair

Memvid includes built-in tools for file health:

Verify

Check file integrity without modification:

# Quick verification
memvid verify notes.mv2

# Deep verification (slower, more thorough)
memvid verify notes.mv2 --deep

Doctor

Diagnose and repair issues:

# Preview what would be fixed
memvid doctor notes.mv2 --plan-only

# Rebuild corrupted time index
memvid doctor notes.mv2 --rebuild-time-index

# Rebuild lexical index
memvid doctor notes.mv2 --rebuild-lex-index

# Compact deleted frames
memvid doctor notes.mv2 --vacuum

Single-File Check

Ensure no auxiliary files were created:

memvid verify-single-file notes.mv2

Checksums and Integrity

Defense in depth with cascading checksums:

Level	What’s Checked
Header	TOC checksum (BLAKE3)
WAL Records	Per-record checksum
Index Segments	Per-segment checksum
Frames	Per-frame payload checksum

Next Steps

File Format Details - Deep dive into the MV2 structure
CLI Commands - Complete CLI reference
Python SDK - Python bindings guide
Node.js SDK - Node.js bindings guide

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​Core Design Principles

​1. Single-File Guarantee

​2. Crash Safety

​3. Determinism

​4. Performance

​File Layout

​Header

​Embedded WAL

​Frames

​Search Architecture

​Lexical Search (BM25)

​Vector Search

​Hybrid Search

​Developer Walkthrough

​Using the CLI

​Using the Python SDK

​Using the Node.js SDK

​Verification and Repair

​Verify

​Doctor

​Single-File Check

​Checksums and Integrity

​Next Steps