Documentation Index
Fetch the complete documentation index at: https://docs.memvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Memvid is designed around a simple but powerful principle: everything in one file. This page explains the architecture that makes this possible.
Core Design Principles
1. Single-File Guarantee
Every .mv2 file is completely self-contained:
- No sidecars - Never creates
.wal, .shm, .lock, or journal files
- Fully portable - Copy, move, or share the file freely
- No database - No external services required
2. Crash Safety
The embedded Write-Ahead Log (WAL) ensures data integrity:
- Writes go to WAL first, then to permanent storage
- Automatic recovery on file open after crashes
- Recovery completes in under 250ms even for large files
3. Determinism
Same inputs produce identical bytes on the same platform:
- Reproducible builds for testing and QA
- Verifiable file integrity with checksums
- Predictable behavior across runs
Optimized for fast search and retrieval:
- Search latency: ~5ms for 50K documents
- Cold start: under 200ms
- WAL append: under 0.1ms per write
File Layout
The .mv2 file format has a well-defined structure:
The 4 KB header contains:
| Field | Description |
|---|
| Magic | MV2 identifier |
| Version | File format version |
| WAL Offset | Start of embedded WAL region |
| WAL Size | Size of WAL ring buffer |
| Checkpoint Position | Last committed WAL position |
| TOC Checksum | BLAKE3 hash for integrity |
Embedded WAL
The WAL is sized based on total file capacity:
| File Size | WAL Size |
|---|
| Under 100 MB | 1 MB |
| Under 1 GB | 4 MB |
| Under 10 GB | 16 MB |
| 10 GB or more | 64 MB |
Checkpoint triggers:
- WAL reaches 75% capacity
- User calls
seal()
- Every 1,000 transactions
- Clean shutdown
Frames
Frames are the fundamental unit of storage. Each frame contains:
- Payload - The actual content (text, binary, media)
- Metadata - Title, URI, timestamps, tags, labels
- Checksum - BLAKE3 hash for verification
- Encoding - Plain or Zstd compressed
Search Architecture
Memvid supports three search modes:
Lexical Search (BM25)
Fast keyword search using BM25 ranking:
- Full-text search with term frequency scoring
- Date range filters:
date:[2024-01-01 TO 2024-12-31]
- Tokenization and stemming
Vector Search
Semantic similarity search using embeddings:
- Fast approximate nearest neighbor search
- Optional Product Quantization (PQ) for 16x compression
- Configurable embedding models
Hybrid Search
Combines both approaches:
- Run lexical search for keyword matches
- Run vector search for semantic similarity
- Merge and rerank results
- Return top-k hits
Developer Walkthrough
Here’s how to work with Memvid in practice:
Using the CLI
# Create a new memory
memvid create notes.mv2
# Add documents
memvid put notes.mv2 --input ./docs/ --vector-compression
# Search
memvid find notes.mv2 --query "machine learning" --mode auto
# Ask questions
memvid ask notes.mv2 --question "What are the key points?"
# View timeline
memvid timeline notes.mv2 --limit 10
# Check health
memvid doctor notes.mv2 --plan-only
Using the Python SDK
from memvid_sdk import use
# Open or create
mem = use('basic', 'notes.mv2')
# Add content
mem.put(text="Introduction to neural networks...", title="NN Intro")
# Batch add (100-200x faster)
mem.put_many([
{'text': 'Chapter 1...', 'title': 'Ch 1'},
{'text': 'Chapter 2...', 'title': 'Ch 2'},
])
# Search
results = mem.find('neural networks', k=5)
# Ask with LLM
answer = mem.ask('What is a neural network?', model='openai:gpt-4o')
# Close properly
mem.seal()
Using the Node.js SDK
import { use } from '@memvid/sdk';
// Open or create
const mem = await use('basic', 'notes.mv2');
// Add content
await mem.put({ text: 'Introduction to neural networks...', title: 'NN Intro', label: 'intro' });
// Search
const results = await mem.find('neural networks', { k: 5 });
// Ask with LLM
const answer = await mem.ask('What is a neural network?', {
model: 'openai:gpt-4o',
modelApiKey: process.env.OPENAI_API_KEY
});
// Close properly
await mem.seal();
Verification and Repair
Memvid includes built-in tools for file health:
Verify
Check file integrity without modification:
# Quick verification
memvid verify notes.mv2
# Deep verification (slower, more thorough)
memvid verify notes.mv2 --deep
Doctor
Diagnose and repair issues:
# Preview what would be fixed
memvid doctor notes.mv2 --plan-only
# Rebuild corrupted time index
memvid doctor notes.mv2 --rebuild-time-index
# Rebuild lexical index
memvid doctor notes.mv2 --rebuild-lex-index
# Compact deleted frames
memvid doctor notes.mv2 --vacuum
Single-File Check
Ensure no auxiliary files were created:
memvid verify-single-file notes.mv2
Checksums and Integrity
Defense in depth with cascading checksums:
| Level | What’s Checked |
|---|
| Header | TOC checksum (BLAKE3) |
| WAL Records | Per-record checksum |
| Index Segments | Per-segment checksum |
| Frames | Per-frame payload checksum |
Next Steps