Documentation Index Fetch the complete documentation index at: https://docs.memvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Memvid uses three complementary index types to enable fast, intelligent search across your documents. Each index serves a different purpose and can be enabled or disabled based on your needs.
Index Overview
Index Engine Purpose Best For Lexical BM25 Full-text keyword search Exact terms, error codes, names Vector Vector search Semantic similarity search Natural language, concepts Time Sorted tuples Chronological ordering Timeline queries, auditing
All three indices are embedded directly in the .mv2 file. No external dependencies or sidecar files.
Lexical Index
The lexical index powers fast, precise keyword search using BM25, a proven ranking algorithm for full-text search.
How It Works
BM25 ranking : Scores documents by term frequency and inverse document frequency
Tokenization : Breaks text into searchable terms
Memory-mapped : Uses mmap for efficient disk access
Embedded : Stored as a snapshot inside the .mv2 file
When to Use
Lexical search excels at finding exact matches:
# Find exact error codes
memvid find knowledge.mv2 --query "ERR_CONNECTION_REFUSED" --mode lex
# Find function names
memvid find knowledge.mv2 --query "handleAuthentication" --mode lex
# Date range queries
memvid find knowledge.mv2 --query "date:[2024-01-01 TO 2024-12-31]" --mode lex
Building the Index
The lexical index is built automatically when you add documents. You can also rebuild it:
# Rebuild lexical index
memvid doctor knowledge.mv2 --rebuild-lex-index
# Check index status
memvid stats knowledge.mv2 --json | grep has_lex_index
Disabling Lexical Index
For vector-only workloads, you can disable lexical indexing:
# Create without lexical index
memvid create knowledge.mv2 --no-lex
# Python SDK
mem = use( 'basic' , 'knowledge.mv2' , enable_lex = False )
Vector Index
The vector index enables semantic search, finding documents by meaning rather than exact keywords.
How It Works
Embeddings : Documents are converted to dense vectors (default: BGE-small, 384 dimensions)
External providers : Support for OpenAI, Cohere, Voyage, and HuggingFace models
Vector graph : Fast approximate nearest neighbor search for semantic similarity
Product Quantization (PQ) : Optional 16x compression for large collections
Embedded : Stored as segments inside the .mv2 file
Embedding Model Options
Model Dimensions Description BGE-small (default) 384 Built-in, offline, no API key OpenAI text-embedding-3-small 1536 High quality, general purpose OpenAI text-embedding-3-large 3072 Highest quality Cohere embed-english-v3.0 1024 English documents Voyage voyage-3 1024 Code and technical docs
See Embedding Models for detailed configuration.
When to Use
Vector search excels at understanding intent:
# Natural language questions
memvid find knowledge.mv2 --query "how do users log in" --mode sem
# Conceptual queries
memvid find knowledge.mv2 --query "best practices for security" --mode sem
# Find similar content
memvid find knowledge.mv2 --query "machine learning model training" --mode sem
Building the Index
Enable embeddings when adding documents:
# Add with embeddings
memvid put knowledge.mv2 --input document.pdf --vector-compression
# Add with compression (16x smaller vectors)
memvid put knowledge.mv2 --input document.pdf --vector-compression
# Python SDK
mem.put( text = "Content" , title = "Doc" , enable_embedding = True )
# With compression
mem.put( text = "Content" , title = "Doc" , enable_embedding = True , vector_compression = True )
Rebuilding the Index
If vector search isn’t working correctly:
# Rebuild vector index
memvid doctor knowledge.mv2 --rebuild-vec-index
# Check index status
memvid stats knowledge.mv2 --json | grep has_vec_index
Direct Vector Search
For custom embeddings from your own model:
# Search with pre-computed vector
memvid vec-search knowledge.mv2 --vector "0.1,0.2,0.3,..." --limit 10
# Search with embedding file
memvid vec-search knowledge.mv2 --embedding ./query-embedding.json --limit 5
Time Index
The time index enables chronological queries and time-travel features.
How It Works
Sorted tuples : Stores (timestamp, frame_id) pairs in sorted order
MVTI magic : Identified by MVTI header bytes
O(log n) lookups : Binary search for efficient time range queries
Checksummed : Protected by integrity verification
When to Use
Time-based access patterns:
# Browse recent documents
memvid timeline knowledge.mv2 --limit 20
# Filter by time range
memvid timeline knowledge.mv2 --since 1704067200 --until 1706745600
# Reverse chronological order
memvid timeline knowledge.mv2 --reverse
Time-Travel Queries
View your memory as it existed at a point in time:
# Search as of a specific frame
memvid find knowledge.mv2 --query "config" --as-of-frame 100
# Search as of a specific timestamp
memvid find knowledge.mv2 --query "config" --as-of-ts 1704067200
# Timeline at a specific frame
memvid timeline knowledge.mv2 --as-of-frame 50
# Python SDK time-travel
results = mem.find( 'config' , as_of_frame = 100 )
results = mem.find( 'config' , as_of_ts = 1704067200 )
Rebuilding the Time Index
If timeline queries return incorrect results:
# Rebuild time index
memvid doctor knowledge.mv2 --rebuild-time-index
# Verify time index
memvid verify knowledge.mv2 --deep
Hybrid Search
Hybrid search (mode auto) combines lexical and semantic results for the best of both worlds.
How It Works
Parallel query : Both lexical and vector indices are queried
Result fusion : Scores are combined using reciprocal rank fusion
Reranking : Top results are reranked for relevance
Deduplication : Duplicate frames are merged
When to Use
Hybrid search is recommended for most use cases:
# Default mode is hybrid
memvid find knowledge.mv2 --query "authentication best practices"
# Explicit hybrid mode
memvid find knowledge.mv2 --query "OAuth2 patterns" --mode auto
Mode Speed Recall Best For lexFastest Exact matches Technical terms, IDs semModerate Semantic similarity Natural language autoBalanced Comprehensive General queries
Tracks
Tracks are logical groupings for organizing content within a memory.
What Tracks Are
Namespace : Group related documents together
Filterable : Search within specific tracks
Metadata : Organizational label stored with each frame
Using Tracks
# Add to a specific track
memvid put knowledge.mv2 --input api-docs.md --track "api"
memvid put knowledge.mv2 --input meeting-notes.md --track "meetings"
# Search within a track (via scope)
memvid find knowledge.mv2 --query "authentication" --scope "mv2://api/"
# Python SDK
mem.put( text = "API documentation" , title = "Auth" , track = "api" )
mem.put( text = "Meeting notes" , title = "Standup" , track = "meetings" )
# Search within scope
results = mem.find( 'authentication' , scope = 'mv2://api/' )
Common Track Patterns
Track Use Case documentationTechnical docs and guides codeSource code and snippets meetingsMeeting notes and transcripts researchPapers and references archivedOld or deprecated content
Index Statistics
Check the status of all indices:
memvid stats knowledge.mv2 --json
{
"frame_count" : 150 ,
"has_lex_index" : true ,
"has_vec_index" : true ,
"has_time_index" : true ,
"lex_index_bytes" : 2202009 ,
"vec_index_bytes" : 1887436 ,
"time_index_bytes" : 310478
}
Best Practices
Index Selection
Scenario Recommended Indices Full-featured search All three (default) Keyword-only search Lexical only Semantic similarity Vector only Large collections All with vector compression Audit/compliance Time index required
Use put_many() for batch ingestion : 100-200x faster than individual put() calls
Enable vector compression for large collections to reduce storage
Rebuild indices if search quality degrades after crashes
Use hybrid mode for best recall on general queries
Maintenance
Regular index maintenance keeps search performing well:
# Weekly: Verify integrity
memvid verify knowledge.mv2 --deep
# After many deletions: Vacuum and rebuild
memvid doctor knowledge.mv2 --vacuum --rebuild-lex-index
# After crashes: Full repair
memvid doctor knowledge.mv2 \
--rebuild-time-index \
--rebuild-lex-index \
--rebuild-vec-index
Next Steps
Memory Architecture Understand the internal structure of .mv2 files
Search & Ask Learn advanced search techniques