Skip to main content
Memvid is designed around a simple but powerful principle: everything in one file. This page explains the architecture that makes this possible.

Core Design Principles

1. Single-File Guarantee

Every .mv2 file is completely self-contained:
  • No sidecars - Never creates .wal, .shm, .lock, or journal files
  • Fully portable - Copy, move, or share the file freely
  • No database - No external services required

2. Crash Safety

The embedded Write-Ahead Log (WAL) ensures data integrity:
  • Writes go to WAL first, then to permanent storage
  • Automatic recovery on file open after crashes
  • Recovery completes in under 250ms even for large files

3. Determinism

Same inputs produce identical bytes on the same platform:
  • Reproducible builds for testing and QA
  • Verifiable file integrity with checksums
  • Predictable behavior across runs

4. Performance

Optimized for fast search and retrieval:
  • Search latency: ~5ms for 50K documents
  • Cold start: under 200ms
  • WAL append: under 0.1ms per write

File Layout

The .mv2 file format has a well-defined structure: The 4 KB header contains:
FieldDescription
MagicMV2 identifier
VersionFile format version
WAL OffsetStart of embedded WAL region
WAL SizeSize of WAL ring buffer
Checkpoint PositionLast committed WAL position
TOC ChecksumBLAKE3 hash for integrity

Embedded WAL

The WAL is sized based on total file capacity:
File SizeWAL Size
Under 100 MB1 MB
Under 1 GB4 MB
Under 10 GB16 MB
10 GB or more64 MB
Checkpoint triggers:
  • WAL reaches 75% capacity
  • User calls seal()
  • Every 1,000 transactions
  • Clean shutdown

Frames

Frames are the fundamental unit of storage. Each frame contains:
  • Payload - The actual content (text, binary, media)
  • Metadata - Title, URI, timestamps, tags, labels
  • Checksum - BLAKE3 hash for verification
  • Encoding - Plain or Zstd compressed

Search Architecture

Memvid supports three search modes:

Lexical Search (BM25)

Fast keyword search using BM25 ranking:
  • Full-text search with term frequency scoring
  • Date range filters: date:[2024-01-01 TO 2024-12-31]
  • Tokenization and stemming
Semantic similarity search using embeddings:
  • Fast approximate nearest neighbor search
  • Optional Product Quantization (PQ) for 16x compression
  • Configurable embedding models
Combines both approaches:
  1. Run lexical search for keyword matches
  2. Run vector search for semantic similarity
  3. Merge and rerank results
  4. Return top-k hits

Developer Walkthrough

Here’s how to work with Memvid in practice:

Using the CLI

# Create a new memory
memvid create notes.mv2

# Add documents
memvid put notes.mv2 --input ./docs/ --vector-compression

# Search
memvid find notes.mv2 --query "machine learning" --mode auto

# Ask questions
memvid ask notes.mv2 --question "What are the key points?"

# View timeline
memvid timeline notes.mv2 --limit 10

# Check health
memvid doctor notes.mv2 --plan-only

Using the Python SDK

from memvid_sdk import use

# Open or create
mem = use('basic', 'notes.mv2')

# Add content
mem.put(text="Introduction to neural networks...", title="NN Intro")

# Batch add (100-200x faster)
mem.put_many([
    {'text': 'Chapter 1...', 'title': 'Ch 1'},
    {'text': 'Chapter 2...', 'title': 'Ch 2'},
])

# Search
results = mem.find('neural networks', k=5)

# Ask with LLM
answer = mem.ask('What is a neural network?', model='openai:gpt-4o')

# Close properly
mem.seal()

Using the Node.js SDK

import { use } from '@memvid/sdk';

// Open or create
const mem = await use('basic', 'notes.mv2');

// Add content
await mem.put({ text: 'Introduction to neural networks...', title: 'NN Intro', label: 'intro' });

// Search
const results = await mem.find('neural networks', { k: 5 });

// Ask with LLM
const answer = await mem.ask('What is a neural network?', {
  model: 'openai:gpt-4o',
  modelApiKey: process.env.OPENAI_API_KEY
});

// Close properly
await mem.seal();

Verification and Repair

Memvid includes built-in tools for file health:

Verify

Check file integrity without modification:
# Quick verification
memvid verify notes.mv2

# Deep verification (slower, more thorough)
memvid verify notes.mv2 --deep

Doctor

Diagnose and repair issues:
# Preview what would be fixed
memvid doctor notes.mv2 --plan-only

# Rebuild corrupted time index
memvid doctor notes.mv2 --rebuild-time-index

# Rebuild lexical index
memvid doctor notes.mv2 --rebuild-lex-index

# Compact deleted frames
memvid doctor notes.mv2 --vacuum

Single-File Check

Ensure no auxiliary files were created:
memvid verify-single-file notes.mv2

Checksums and Integrity

Defense in depth with cascading checksums:
LevelWhat’s Checked
HeaderTOC checksum (BLAKE3)
WAL RecordsPer-record checksum
Index SegmentsPer-segment checksum
FramesPer-frame payload checksum

Next Steps