Performance Tuning

Memvid is designed for high performance out of the box, but different use cases benefit from different configurations. This guide covers tuning options for ingestion speed, search latency, storage efficiency, and retrieval quality.

Quick Recommendations

Use Case	Configuration
Code search	`--no-vec`, `--mode lex`
Fast prototyping	`bge-small` model, small memory size
Production RAG	`bge-base` or `nomic`, adaptive retrieval
Large documents	Parallel ingestion, higher size limit
Minimal storage	`--no-vec` or `bge-small`
Best quality	`gte-large` or OpenAI embeddings

Ingestion Performance

Parallel Ingestion

For large folders, enable parallel processing:

# Process multiple files concurrently
memvid put memory.mv2 --input ./large-folder/ --parallel-segments

# Combine with embedding skip for fastest ingestion
memvid put memory.mv2 --input ./logs/ --embedding-skip --parallel-segments

Performance comparison:

Files	Sequential	Parallel
100 docs	45s	12s
1,000 docs	7m	2m
10,000 docs	1h 10m	20m

Skip Embeddings

For lexical-only search or when you’ll add embeddings later:

# No vector embeddings (lexical only)
memvid create memory.mv2 --no-vec
memvid put memory.mv2 --input docs/

# Or skip per-ingestion
memvid put memory.mv2 --input logs.txt --embedding-skip

Benefits:

10x faster ingestion
60% smaller file size
Full lexical search still available

Embedding Model Selection

Choose based on speed/quality tradeoff:

Model	Speed	Quality	Size	Best For
`bge-small`	Fastest	Good	33MB	Prototyping, large volumes
`bge-base`	Fast	Better	110MB	Production (default)
`nomic`	Fast	Better	137MB	Long documents
`gte-large`	Slower	Best	335MB	Maximum quality
`openai`	API	Excellent	-	Best quality, requires API

# Use smaller model for speed
memvid -m bge-small put memory.mv2 --input docs/

# Use larger model for quality
memvid -m gte-large put memory.mv2 --input docs/

Search Performance

Search Mode Selection

Mode	Speed	Best For
`lex`	Fastest	Exact matches, code, keywords
`sem`	Fast	Conceptual queries, similar meaning
`auto`	Balanced	General use (default)

# Lexical only (fastest)
memvid find memory.mv2 --query "handleAuth" --mode lex

# Semantic only
memvid find memory.mv2 --query "authentication logic" --mode sem

# Hybrid (default)
memvid find memory.mv2 --query "auth" --mode auto

Adaptive Retrieval

Adaptive retrieval automatically adjusts result count based on query relevance. Disable for consistent performance:

# Fixed result count (faster, predictable)
memvid find memory.mv2 --query "term" --no-adaptive --top-k 10

# Adaptive (may return fewer, but higher quality)
memvid find memory.mv2 --query "term"  # Default

Scope Filtering

Narrow search scope for faster results:

# Search only in specific directory
memvid find memory.mv2 --query "config" --scope "src/config/"

# Search specific document
memvid find memory.mv2 --query "api key" --uri "docs/security.md"

Sketch Index

For very large memories (100k+ frames), build a sketch index for faster approximate search:

# Build sketch index
memvid sketch build memory.mv2 --variant medium

# Check sketch status
memvid sketch info memory.mv2

Sketch variants:

Variant	Build Time	Query Speed	Accuracy
`small`	Fast	~2x faster	90%
`medium`	Moderate	~3x faster	95%
`large`	Slower	~5x faster	98%

Storage Optimization

Memory Size

Set appropriate size limits:

# Small memory for quick projects
memvid create notes.mv2 --size 10MB

# Large memory for document archives
memvid create archive.mv2 --size 50MB

Size recommendations:

Content	Recommended Size
Personal notes	10-15MB
Single project	15-25MB
Documentation	25-35MB
Large archive	40-50MB

Vacuum and Compact

After deletions or updates, reclaim space:

# Compact storage
memvid doctor memory.mv2 --vacuum

# Full optimization
memvid doctor memory.mv2 --vacuum --rebuild-lex-index --rebuild-vec-index

Index Selection

Disable indexes you don’t need:

# No vector index (lexical only)
memvid create code.mv2 --no-vec

# No lexical index (semantic only)
memvid create semantic.mv2 --no-lex

Storage impact:

Configuration	Relative Size
Full (default)	100%
No vectors	~40%
No lexical	~85%
Neither	~25%

RAG Performance

Model Selection

Choose synthesis model based on needs:

Model	Speed	Quality	Cost
`tinyllama`	Fastest	Basic	Free
`groq`	Very fast	Good	Low
`gemini`	Fast	Good	Low
`openai`	Moderate	Excellent	Medium
`claude`	Moderate	Excellent	Medium

# Fast local synthesis
memvid ask memory.mv2 --question "..." --use-model tinyllama

# Fast API synthesis
memvid ask memory.mv2 --question "..." --use-model groq

Context-Only Mode

Skip synthesis for maximum speed:

# Get relevant context without LLM synthesis
memvid ask memory.mv2 --question "What are the config options?" --context-only

Use cases:

Feed context to your own LLM
Debugging retrieval quality
Batch processing

Index Maintenance

Rebuild Indexes

Periodically rebuild for optimal performance:

# Rebuild all indexes
memvid doctor memory.mv2 --rebuild-lex-index --rebuild-vec-index --rebuild-time-index

# Rebuild specific index
memvid doctor memory.mv2 --rebuild-vec-index

When to rebuild:

After many deletions (>20% of content)
Search results seem slow or inaccurate
After model upgrade

Verify Integrity

Check for corruption:

# Quick check
memvid verify memory.mv2

# Deep check
memvid verify memory.mv2 --deep

Benchmarks

Typical performance on M1 Mac with SSD:

Ingestion Speed

Content Type	Speed (with embeddings)	Speed (no embeddings)
Plain text	~1,000 chunks/sec	~10,000 chunks/sec
PDF (text)	~200 pages/min	~2,000 pages/min
Code files	~500 files/min	~5,000 files/min

Search Latency

Memory Size	Lexical	Semantic	Hybrid
1,000 frames	~5ms	~10ms	~15ms
10,000 frames	~10ms	~25ms	~35ms
100,000 frames	~20ms	~50ms	~70ms
1M frames (sketch)	~30ms	~60ms	~90ms

Ask Latency

Model	Retrieval + Synthesis
tinyllama	~500ms
groq	~800ms
openai	~1.5s
claude	~2s

SDK Performance Tips

Python

from memvid import use

# Reuse memory instance
mem = use('basic', 'memory.mv2')

# Batch operations
texts = [...]
for text in texts:
    mem.put(text)  # Batched internally

# Async for better throughput
import asyncio
from memvid import use_async

async def main():
    mem = await use_async('basic', 'memory.mv2')
    results = await asyncio.gather(*[
        mem.find(q) for q in queries
    ])

Node.js

import { use } from '@anthropics/memvid'

// Reuse memory instance
const mem = await use('basic', 'memory.mv2')

// Parallel searches
const results = await Promise.all(
  queries.map(q => mem.find(q))
)

// Stream large results
for await (const chunk of mem.findStream(query)) {
  process.stdout.write(chunk)
}

Monitoring

Query Tracking

Monitor usage patterns:

# View usage statistics
memvid plan show

# JSON format for monitoring
memvid stats memory.mv2 --json

Memory Statistics

# Detailed stats
memvid stats memory.mv2

# Output example:
# Frames: 10,234
# Size: 45.2 MB
# Vector index: 23.1 MB
# Lexical index: 8.4 MB
# Avg query time: 12ms

Troubleshooting Performance

Slow Ingestion

Enable parallel ingestion: --parallel-segments
Use smaller embedding model: -m bge-small
Skip embeddings if not needed: --embedding-skip

Slow Search

Use lexical mode for exact matches: --mode lex
Build sketch index for large memories
Narrow scope: --scope "relevant/path/"

High Memory Usage

Use smaller embedding model
Create with --no-vec if lexical is sufficient
Vacuum after deletions: --vacuum

Large File Size

Enable no-vec mode
Vacuum to reclaim deleted space
Use smaller embedding model

Next Steps

Embedding Models

Model comparison

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​Quick Recommendations

​Ingestion Performance

​Parallel Ingestion

​Skip Embeddings

​Embedding Model Selection

​Search Performance

​Search Mode Selection

​Adaptive Retrieval

​Scope Filtering

​Sketch Index

​Storage Optimization

​Memory Size

​Vacuum and Compact

​Index Selection

​RAG Performance

​Model Selection

​Context-Only Mode

​Index Maintenance

​Rebuild Indexes

​Verify Integrity

​Benchmarks

​Ingestion Speed

​Search Latency

​Ask Latency

​SDK Performance Tips

​Python

​Node.js

​Monitoring

​Query Tracking

​Memory Statistics

​Troubleshooting Performance

​Slow Ingestion

​Slow Search

​High Memory Usage

​Large File Size

​Next Steps