Documentation Index Fetch the complete documentation index at: https://docs.memvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Memvid is designed for high performance out of the box, but different use cases benefit from different configurations. This guide covers tuning options for ingestion speed, search latency, storage efficiency, and retrieval quality.
Quick Recommendations
Use Case Configuration Code search --no-vec, --mode lexFast prototyping bge-small model, small memory sizeProduction RAG bge-base or nomic, adaptive retrievalLarge documents Parallel ingestion, higher size limit Minimal storage --no-vec or bge-smallBest quality gte-large or OpenAI embeddings
Parallel Ingestion
For large folders, enable parallel processing:
# Process multiple files concurrently
memvid put memory.mv2 --input ./large-folder/ --parallel-segments
# Combine with embedding skip for fastest ingestion
memvid put memory.mv2 --input ./logs/ --embedding-skip --parallel-segments
Performance comparison:
Files Sequential Parallel 100 docs 45s 12s 1,000 docs 7m 2m 10,000 docs 1h 10m 20m
Skip Embeddings
For lexical-only search or when you’ll add embeddings later:
# No vector embeddings (lexical only)
memvid create memory.mv2 --no-vec
memvid put memory.mv2 --input docs/
# Or skip per-ingestion
memvid put memory.mv2 --input logs.txt --embedding-skip
Benefits:
10x faster ingestion
60% smaller file size
Full lexical search still available
Embedding Model Selection
Choose based on speed/quality tradeoff:
Model Speed Quality Size Best For bge-smallFastest Good 33MB Prototyping, large volumes bge-baseFast Better 110MB Production (default) nomicFast Better 137MB Long documents gte-largeSlower Best 335MB Maximum quality openaiAPI Excellent - Best quality, requires API
# Use smaller model for speed
memvid -m bge-small put memory.mv2 --input docs/
# Use larger model for quality
memvid -m gte-large put memory.mv2 --input docs/
Search Mode Selection
Mode Speed Best For lexFastest Exact matches, code, keywords semFast Conceptual queries, similar meaning autoBalanced General use (default)
# Lexical only (fastest)
memvid find memory.mv2 --query "handleAuth" --mode lex
# Semantic only
memvid find memory.mv2 --query "authentication logic" --mode sem
# Hybrid (default)
memvid find memory.mv2 --query "auth" --mode auto
Adaptive Retrieval
Adaptive retrieval automatically adjusts result count based on query relevance. Disable for consistent performance:
# Fixed result count (faster, predictable)
memvid find memory.mv2 --query "term" --no-adaptive --top-k 10
# Adaptive (may return fewer, but higher quality)
memvid find memory.mv2 --query "term" # Default
Scope Filtering
Narrow search scope for faster results:
# Search only in specific directory
memvid find memory.mv2 --query "config" --scope "src/config/"
# Search specific document
memvid find memory.mv2 --query "api key" --uri "docs/security.md"
Sketch Index
For very large memories (100k+ frames), build a sketch index for faster approximate search:
# Build sketch index
memvid sketch build memory.mv2 --variant medium
# Check sketch status
memvid sketch info memory.mv2
Sketch variants:
Variant Build Time Query Speed Accuracy smallFast ~2x faster 90% mediumModerate ~3x faster 95% largeSlower ~5x faster 98%
Storage Optimization
Memory Size
Set appropriate size limits:
# Small memory for quick projects
memvid create notes.mv2 --size 10MB
# Large memory for document archives
memvid create archive.mv2 --size 50MB
Size recommendations:
Content Recommended Size Personal notes 10-15MB Single project 15-25MB Documentation 25-35MB Large archive 40-50MB
Vacuum and Compact
After deletions or updates, reclaim space:
# Compact storage
memvid doctor memory.mv2 --vacuum
# Full optimization
memvid doctor memory.mv2 --vacuum --rebuild-lex-index --rebuild-vec-index
Index Selection
Disable indexes you don’t need:
# No vector index (lexical only)
memvid create code.mv2 --no-vec
# No lexical index (semantic only)
memvid create semantic.mv2 --no-lex
Storage impact:
Configuration Relative Size Full (default) 100% No vectors ~40% No lexical ~85% Neither ~25%
Model Selection
Choose synthesis model based on needs:
Model Speed Quality Cost tinyllamaFastest Basic Free groqVery fast Good Low geminiFast Good Low openaiModerate Excellent Medium claudeModerate Excellent Medium
# Fast local synthesis
memvid ask memory.mv2 --question "..." --use-model tinyllama
# Fast API synthesis
memvid ask memory.mv2 --question "..." --use-model groq
Context-Only Mode
Skip synthesis for maximum speed:
# Get relevant context without LLM synthesis
memvid ask memory.mv2 --question "What are the config options?" --context-only
Use cases:
Feed context to your own LLM
Debugging retrieval quality
Batch processing
Index Maintenance
Rebuild Indexes
Periodically rebuild for optimal performance:
# Rebuild all indexes
memvid doctor memory.mv2 --rebuild-lex-index --rebuild-vec-index --rebuild-time-index
# Rebuild specific index
memvid doctor memory.mv2 --rebuild-vec-index
When to rebuild:
After many deletions (>20% of content)
Search results seem slow or inaccurate
After model upgrade
Verify Integrity
Check for corruption:
# Quick check
memvid verify memory.mv2
# Deep check
memvid verify memory.mv2 --deep
Benchmarks
Typical performance on M1 Mac with SSD:
Ingestion Speed
Content Type Speed (with embeddings) Speed (no embeddings) Plain text ~1,000 chunks/sec ~10,000 chunks/sec PDF (text) ~200 pages/min ~2,000 pages/min Code files ~500 files/min ~5,000 files/min
Search Latency
Memory Size Lexical Semantic Hybrid 1,000 frames ~5ms ~10ms ~15ms 10,000 frames ~10ms ~25ms ~35ms 100,000 frames ~20ms ~50ms ~70ms 1M frames (sketch) ~30ms ~60ms ~90ms
Ask Latency
Model Retrieval + Synthesis tinyllama ~500ms groq ~800ms openai ~1.5s claude ~2s
Python
from memvid import use
# Reuse memory instance
mem = use( 'basic' , 'memory.mv2' )
# Batch operations
texts = [ ... ]
for text in texts:
mem.put(text) # Batched internally
# Async for better throughput
import asyncio
from memvid import use_async
async def main ():
mem = await use_async( 'basic' , 'memory.mv2' )
results = await asyncio.gather( * [
mem.find(q) for q in queries
])
Node.js
import { use } from '@anthropics/memvid'
// Reuse memory instance
const mem = await use ( 'basic' , 'memory.mv2' )
// Parallel searches
const results = await Promise . all (
queries . map ( q => mem . find ( q ))
)
// Stream large results
for await ( const chunk of mem . findStream ( query )) {
process . stdout . write ( chunk )
}
Monitoring
Query Tracking
Monitor usage patterns:
# View usage statistics
memvid plan show
# JSON format for monitoring
memvid stats memory.mv2 --json
Memory Statistics
# Detailed stats
memvid stats memory.mv2
# Output example:
# Frames: 10,234
# Size: 45.2 MB
# Vector index: 23.1 MB
# Lexical index: 8.4 MB
# Avg query time: 12ms
Slow Ingestion
Enable parallel ingestion: --parallel-segments
Use smaller embedding model: -m bge-small
Skip embeddings if not needed: --embedding-skip
Slow Search
Use lexical mode for exact matches: --mode lex
Build sketch index for large memories
Narrow scope: --scope "relevant/path/"
High Memory Usage
Use smaller embedding model
Create with --no-vec if lexical is sufficient
Vacuum after deletions: --vacuum
Large File Size
Enable no-vec mode
Vacuum to reclaim deleted space
Use smaller embedding model
Next Steps
Embedding Models Model comparison