Quick Recommendations
| Use Case | Configuration |
|---|---|
| Code search | --no-vec, --mode lex |
| Fast prototyping | bge-small model, small memory size |
| Production RAG | bge-base or nomic, adaptive retrieval |
| Large documents | Parallel ingestion, higher size limit |
| Minimal storage | --no-vec or bge-small |
| Best quality | gte-large or OpenAI embeddings |
Ingestion Performance
Parallel Ingestion
For large folders, enable parallel processing:| Files | Sequential | Parallel |
|---|---|---|
| 100 docs | 45s | 12s |
| 1,000 docs | 7m | 2m |
| 10,000 docs | 1h 10m | 20m |
Skip Embeddings
For lexical-only search or when you’ll add embeddings later:- 10x faster ingestion
- 60% smaller file size
- Full lexical search still available
Embedding Model Selection
Choose based on speed/quality tradeoff:| Model | Speed | Quality | Size | Best For |
|---|---|---|---|---|
bge-small | Fastest | Good | 33MB | Prototyping, large volumes |
bge-base | Fast | Better | 110MB | Production (default) |
nomic | Fast | Better | 137MB | Long documents |
gte-large | Slower | Best | 335MB | Maximum quality |
openai | API | Excellent | - | Best quality, requires API |
Search Performance
Search Mode Selection
| Mode | Speed | Best For |
|---|---|---|
lex | Fastest | Exact matches, code, keywords |
sem | Fast | Conceptual queries, similar meaning |
auto | Balanced | General use (default) |
Adaptive Retrieval
Adaptive retrieval automatically adjusts result count based on query relevance. Disable for consistent performance:Scope Filtering
Narrow search scope for faster results:Sketch Index
For very large memories (100k+ frames), build a sketch index for faster approximate search:| Variant | Build Time | Query Speed | Accuracy |
|---|---|---|---|
small | Fast | ~2x faster | 90% |
medium | Moderate | ~3x faster | 95% |
large | Slower | ~5x faster | 98% |
Storage Optimization
Memory Size
Set appropriate size limits:| Content | Recommended Size |
|---|---|
| Personal notes | 10-15MB |
| Single project | 15-25MB |
| Documentation | 25-35MB |
| Large archive | 40-50MB |
Vacuum and Compact
After deletions or updates, reclaim space:Index Selection
Disable indexes you don’t need:| Configuration | Relative Size |
|---|---|
| Full (default) | 100% |
| No vectors | ~40% |
| No lexical | ~85% |
| Neither | ~25% |
RAG Performance
Model Selection
Choose synthesis model based on needs:| Model | Speed | Quality | Cost |
|---|---|---|---|
tinyllama | Fastest | Basic | Free |
groq | Very fast | Good | Low |
gemini | Fast | Good | Low |
openai | Moderate | Excellent | Medium |
claude | Moderate | Excellent | Medium |
Context-Only Mode
Skip synthesis for maximum speed:- Feed context to your own LLM
- Debugging retrieval quality
- Batch processing
Index Maintenance
Rebuild Indexes
Periodically rebuild for optimal performance:- After many deletions (>20% of content)
- Search results seem slow or inaccurate
- After model upgrade
Verify Integrity
Check for corruption:Benchmarks
Typical performance on M1 Mac with SSD:Ingestion Speed
| Content Type | Speed (with embeddings) | Speed (no embeddings) |
|---|---|---|
| Plain text | ~1,000 chunks/sec | ~10,000 chunks/sec |
| PDF (text) | ~200 pages/min | ~2,000 pages/min |
| Code files | ~500 files/min | ~5,000 files/min |
Search Latency
| Memory Size | Lexical | Semantic | Hybrid |
|---|---|---|---|
| 1,000 frames | ~5ms | ~10ms | ~15ms |
| 10,000 frames | ~10ms | ~25ms | ~35ms |
| 100,000 frames | ~20ms | ~50ms | ~70ms |
| 1M frames (sketch) | ~30ms | ~60ms | ~90ms |
Ask Latency
| Model | Retrieval + Synthesis |
|---|---|
| tinyllama | ~500ms |
| groq | ~800ms |
| openai | ~1.5s |
| claude | ~2s |
SDK Performance Tips
Python
Node.js
Monitoring
Query Tracking
Monitor usage patterns:Memory Statistics
Troubleshooting Performance
Slow Ingestion
- Enable parallel ingestion:
--parallel-segments - Use smaller embedding model:
-m bge-small - Skip embeddings if not needed:
--embedding-skip
Slow Search
- Use lexical mode for exact matches:
--mode lex - Build sketch index for large memories
- Narrow scope:
--scope "relevant/path/"
High Memory Usage
- Use smaller embedding model
- Create with
--no-vecif lexical is sufficient - Vacuum after deletions:
--vacuum
Large File Size
- Enable no-vec mode
- Vacuum to reclaim deleted space
- Use smaller embedding model