Skip to main content
Memvid is designed for high performance out of the box, but different use cases benefit from different configurations. This guide covers tuning options for ingestion speed, search latency, storage efficiency, and retrieval quality.

Quick Recommendations

Use CaseConfiguration
Code search--no-vec, --mode lex
Fast prototypingbge-small model, small memory size
Production RAGbge-base or nomic, adaptive retrieval
Large documentsParallel ingestion, higher size limit
Minimal storage--no-vec or bge-small
Best qualitygte-large or OpenAI embeddings

Ingestion Performance

Parallel Ingestion

For large folders, enable parallel processing:
# Process multiple files concurrently
memvid put memory.mv2 --input ./large-folder/ --parallel-segments

# Combine with embedding skip for fastest ingestion
memvid put memory.mv2 --input ./logs/ --embedding-skip --parallel-segments
Performance comparison:
FilesSequentialParallel
100 docs45s12s
1,000 docs7m2m
10,000 docs1h 10m20m

Skip Embeddings

For lexical-only search or when you’ll add embeddings later:
# No vector embeddings (lexical only)
memvid create memory.mv2 --no-vec
memvid put memory.mv2 --input docs/

# Or skip per-ingestion
memvid put memory.mv2 --input logs.txt --embedding-skip
Benefits:
  • 10x faster ingestion
  • 60% smaller file size
  • Full lexical search still available

Embedding Model Selection

Choose based on speed/quality tradeoff:
ModelSpeedQualitySizeBest For
bge-smallFastestGood33MBPrototyping, large volumes
bge-baseFastBetter110MBProduction (default)
nomicFastBetter137MBLong documents
gte-largeSlowerBest335MBMaximum quality
openaiAPIExcellent-Best quality, requires API
# Use smaller model for speed
memvid -m bge-small put memory.mv2 --input docs/

# Use larger model for quality
memvid -m gte-large put memory.mv2 --input docs/

Search Performance

Search Mode Selection

ModeSpeedBest For
lexFastestExact matches, code, keywords
semFastConceptual queries, similar meaning
autoBalancedGeneral use (default)
# Lexical only (fastest)
memvid find memory.mv2 --query "handleAuth" --mode lex

# Semantic only
memvid find memory.mv2 --query "authentication logic" --mode sem

# Hybrid (default)
memvid find memory.mv2 --query "auth" --mode auto

Adaptive Retrieval

Adaptive retrieval automatically adjusts result count based on query relevance. Disable for consistent performance:
# Fixed result count (faster, predictable)
memvid find memory.mv2 --query "term" --no-adaptive --top-k 10

# Adaptive (may return fewer, but higher quality)
memvid find memory.mv2 --query "term"  # Default

Scope Filtering

Narrow search scope for faster results:
# Search only in specific directory
memvid find memory.mv2 --query "config" --scope "src/config/"

# Search specific document
memvid find memory.mv2 --query "api key" --uri "docs/security.md"

Sketch Index

For very large memories (100k+ frames), build a sketch index for faster approximate search:
# Build sketch index
memvid sketch build memory.mv2 --variant medium

# Check sketch status
memvid sketch info memory.mv2
Sketch variants:
VariantBuild TimeQuery SpeedAccuracy
smallFast~2x faster90%
mediumModerate~3x faster95%
largeSlower~5x faster98%

Storage Optimization

Memory Size

Set appropriate size limits:
# Small memory for quick projects
memvid create notes.mv2 --size 10MB

# Large memory for document archives
memvid create archive.mv2 --size 50MB
Size recommendations:
ContentRecommended Size
Personal notes10-15MB
Single project15-25MB
Documentation25-35MB
Large archive40-50MB

Vacuum and Compact

After deletions or updates, reclaim space:
# Compact storage
memvid doctor memory.mv2 --vacuum

# Full optimization
memvid doctor memory.mv2 --vacuum --rebuild-lex-index --rebuild-vec-index

Index Selection

Disable indexes you don’t need:
# No vector index (lexical only)
memvid create code.mv2 --no-vec

# No lexical index (semantic only)
memvid create semantic.mv2 --no-lex
Storage impact:
ConfigurationRelative Size
Full (default)100%
No vectors~40%
No lexical~85%
Neither~25%

RAG Performance

Model Selection

Choose synthesis model based on needs:
ModelSpeedQualityCost
tinyllamaFastestBasicFree
groqVery fastGoodLow
geminiFastGoodLow
openaiModerateExcellentMedium
claudeModerateExcellentMedium
# Fast local synthesis
memvid ask memory.mv2 --question "..." --use-model tinyllama

# Fast API synthesis
memvid ask memory.mv2 --question "..." --use-model groq

Context-Only Mode

Skip synthesis for maximum speed:
# Get relevant context without LLM synthesis
memvid ask memory.mv2 --question "What are the config options?" --context-only
Use cases:
  • Feed context to your own LLM
  • Debugging retrieval quality
  • Batch processing

Index Maintenance

Rebuild Indexes

Periodically rebuild for optimal performance:
# Rebuild all indexes
memvid doctor memory.mv2 --rebuild-lex-index --rebuild-vec-index --rebuild-time-index

# Rebuild specific index
memvid doctor memory.mv2 --rebuild-vec-index
When to rebuild:
  • After many deletions (>20% of content)
  • Search results seem slow or inaccurate
  • After model upgrade

Verify Integrity

Check for corruption:
# Quick check
memvid verify memory.mv2

# Deep check
memvid verify memory.mv2 --deep

Benchmarks

Typical performance on M1 Mac with SSD:

Ingestion Speed

Content TypeSpeed (with embeddings)Speed (no embeddings)
Plain text~1,000 chunks/sec~10,000 chunks/sec
PDF (text)~200 pages/min~2,000 pages/min
Code files~500 files/min~5,000 files/min

Search Latency

Memory SizeLexicalSemanticHybrid
1,000 frames~5ms~10ms~15ms
10,000 frames~10ms~25ms~35ms
100,000 frames~20ms~50ms~70ms
1M frames (sketch)~30ms~60ms~90ms

Ask Latency

ModelRetrieval + Synthesis
tinyllama~500ms
groq~800ms
openai~1.5s
claude~2s

SDK Performance Tips

Python

from memvid import use

# Reuse memory instance
mem = use('basic', 'memory.mv2')

# Batch operations
texts = [...]
for text in texts:
    mem.put(text)  # Batched internally

# Async for better throughput
import asyncio
from memvid import use_async

async def main():
    mem = await use_async('basic', 'memory.mv2')
    results = await asyncio.gather(*[
        mem.find(q) for q in queries
    ])

Node.js

import { use } from '@anthropics/memvid'

// Reuse memory instance
const mem = await use('basic', 'memory.mv2')

// Parallel searches
const results = await Promise.all(
  queries.map(q => mem.find(q))
)

// Stream large results
for await (const chunk of mem.findStream(query)) {
  process.stdout.write(chunk)
}

Monitoring

Query Tracking

Monitor usage patterns:
# View usage statistics
memvid plan show

# JSON format for monitoring
memvid stats memory.mv2 --json

Memory Statistics

# Detailed stats
memvid stats memory.mv2

# Output example:
# Frames: 10,234
# Size: 45.2 MB
# Vector index: 23.1 MB
# Lexical index: 8.4 MB
# Avg query time: 12ms

Troubleshooting Performance

Slow Ingestion

  1. Enable parallel ingestion: --parallel-segments
  2. Use smaller embedding model: -m bge-small
  3. Skip embeddings if not needed: --embedding-skip
  1. Use lexical mode for exact matches: --mode lex
  2. Build sketch index for large memories
  3. Narrow scope: --scope "relevant/path/"

High Memory Usage

  1. Use smaller embedding model
  2. Create with --no-vec if lexical is sufficient
  3. Vacuum after deletions: --vacuum

Large File Size

  1. Enable no-vec mode
  2. Vacuum to reclaim deleted space
  3. Use smaller embedding model

Next Steps