Skip to main content
Memvid uses three complementary index types to enable fast, intelligent search across your documents. Each index serves a different purpose and can be enabled or disabled based on your needs.

Index Overview

IndexEnginePurposeBest For
LexicalBM25Full-text keyword searchExact terms, error codes, names
VectorVector searchSemantic similarity searchNatural language, concepts
TimeSorted tuplesChronological orderingTimeline queries, auditing
All three indices are embedded directly in the .mv2 file. No external dependencies or sidecar files.

Lexical Index

The lexical index powers fast, precise keyword search using BM25, a proven ranking algorithm for full-text search.

How It Works

  • BM25 ranking: Scores documents by term frequency and inverse document frequency
  • Tokenization: Breaks text into searchable terms
  • Memory-mapped: Uses mmap for efficient disk access
  • Embedded: Stored as a snapshot inside the .mv2 file

When to Use

Lexical search excels at finding exact matches:
# Find exact error codes
memvid find knowledge.mv2 --query "ERR_CONNECTION_REFUSED" --mode lex

# Find function names
memvid find knowledge.mv2 --query "handleAuthentication" --mode lex

# Date range queries
memvid find knowledge.mv2 --query "date:[2024-01-01 TO 2024-12-31]" --mode lex

Building the Index

The lexical index is built automatically when you add documents. You can also rebuild it:
# Rebuild lexical index
memvid doctor knowledge.mv2 --rebuild-lex-index

# Check index status
memvid stats knowledge.mv2 --json | grep has_lex_index

Disabling Lexical Index

For vector-only workloads, you can disable lexical indexing:
# Create without lexical index
memvid create knowledge.mv2 --no-lex
# Python SDK
mem = use('basic', 'knowledge.mv2', enable_lex=False)

Vector Index

The vector index enables semantic search, finding documents by meaning rather than exact keywords.

How It Works

  • Embeddings: Documents are converted to dense vectors (default: BGE-small, 384 dimensions)
  • External providers: Support for OpenAI, Cohere, Voyage, and HuggingFace models
  • Vector graph: Fast approximate nearest neighbor search for semantic similarity
  • Product Quantization (PQ): Optional 16x compression for large collections
  • Embedded: Stored as segments inside the .mv2 file

Embedding Model Options

ModelDimensionsDescription
BGE-small (default)384Built-in, offline, no API key
OpenAI text-embedding-3-small1536High quality, general purpose
OpenAI text-embedding-3-large3072Highest quality
Cohere embed-english-v3.01024English documents
Voyage voyage-31024Code and technical docs
See Embedding Models for detailed configuration.

When to Use

Vector search excels at understanding intent:
# Natural language questions
memvid find knowledge.mv2 --query "how do users log in" --mode sem

# Conceptual queries
memvid find knowledge.mv2 --query "best practices for security" --mode sem

# Find similar content
memvid find knowledge.mv2 --query "machine learning model training" --mode sem

Building the Index

Enable embeddings when adding documents:
# Add with embeddings
memvid put knowledge.mv2 --input document.pdf --vector-compression

# Add with compression (16x smaller vectors)
memvid put knowledge.mv2 --input document.pdf --vector-compression
# Python SDK
mem.put(text="Content", title="Doc", enable_embedding=True)

# With compression
mem.put(text="Content", title="Doc", enable_embedding=True, vector_compression=True)

Rebuilding the Index

If vector search isn’t working correctly:
# Rebuild vector index
memvid doctor knowledge.mv2 --rebuild-vec-index

# Check index status
memvid stats knowledge.mv2 --json | grep has_vec_index
For custom embeddings from your own model:
# Search with pre-computed vector
memvid vec-search knowledge.mv2 --vector "0.1,0.2,0.3,..." --limit 10

# Search with embedding file
memvid vec-search knowledge.mv2 --embedding ./query-embedding.json --limit 5

Time Index

The time index enables chronological queries and time-travel features.

How It Works

  • Sorted tuples: Stores (timestamp, frame_id) pairs in sorted order
  • MVTI magic: Identified by MVTI header bytes
  • O(log n) lookups: Binary search for efficient time range queries
  • Checksummed: Protected by integrity verification

When to Use

Time-based access patterns:
# Browse recent documents
memvid timeline knowledge.mv2 --limit 20

# Filter by time range
memvid timeline knowledge.mv2 --since 1704067200 --until 1706745600

# Reverse chronological order
memvid timeline knowledge.mv2 --reverse

Time-Travel Queries

View your memory as it existed at a point in time:
# Search as of a specific frame
memvid find knowledge.mv2 --query "config" --as-of-frame 100

# Search as of a specific timestamp
memvid find knowledge.mv2 --query "config" --as-of-ts 1704067200

# Timeline at a specific frame
memvid timeline knowledge.mv2 --as-of-frame 50
# Python SDK time-travel
results = mem.find('config', as_of_frame=100)
results = mem.find('config', as_of_ts=1704067200)

Rebuilding the Time Index

If timeline queries return incorrect results:
# Rebuild time index
memvid doctor knowledge.mv2 --rebuild-time-index

# Verify time index
memvid verify knowledge.mv2 --deep

Hybrid search (mode auto) combines lexical and semantic results for the best of both worlds.

How It Works

  1. Parallel query: Both lexical and vector indices are queried
  2. Result fusion: Scores are combined using reciprocal rank fusion
  3. Reranking: Top results are reranked for relevance
  4. Deduplication: Duplicate frames are merged

When to Use

Hybrid search is recommended for most use cases:
# Default mode is hybrid
memvid find knowledge.mv2 --query "authentication best practices"

# Explicit hybrid mode
memvid find knowledge.mv2 --query "OAuth2 patterns" --mode auto

Performance Comparison

ModeSpeedRecallBest For
lexFastestExact matchesTechnical terms, IDs
semModerateSemantic similarityNatural language
autoBalancedComprehensiveGeneral queries

Tracks

Tracks are logical groupings for organizing content within a memory.

What Tracks Are

  • Namespace: Group related documents together
  • Filterable: Search within specific tracks
  • Metadata: Organizational label stored with each frame

Using Tracks

# Add to a specific track
memvid put knowledge.mv2 --input api-docs.md --track "api"
memvid put knowledge.mv2 --input meeting-notes.md --track "meetings"

# Search within a track (via scope)
memvid find knowledge.mv2 --query "authentication" --scope "mv2://api/"
# Python SDK
mem.put(text="API documentation", title="Auth", track="api")
mem.put(text="Meeting notes", title="Standup", track="meetings")

# Search within scope
results = mem.find('authentication', scope='mv2://api/')

Common Track Patterns

TrackUse Case
documentationTechnical docs and guides
codeSource code and snippets
meetingsMeeting notes and transcripts
researchPapers and references
archivedOld or deprecated content

Index Statistics

Check the status of all indices:
memvid stats knowledge.mv2 --json
{
  "frame_count": 150,
  "has_lex_index": true,
  "has_vec_index": true,
  "has_time_index": true,
  "lex_index_bytes": 2202009,
  "vec_index_bytes": 1887436,
  "time_index_bytes": 310478
}

Best Practices

Index Selection

ScenarioRecommended Indices
Full-featured searchAll three (default)
Keyword-only searchLexical only
Semantic similarityVector only
Large collectionsAll with vector compression
Audit/complianceTime index required

Performance Tips

  1. Use put_many() for batch ingestion: 100-200x faster than individual put() calls
  2. Enable vector compression for large collections to reduce storage
  3. Rebuild indices if search quality degrades after crashes
  4. Use hybrid mode for best recall on general queries

Maintenance

Regular index maintenance keeps search performing well:
# Weekly: Verify integrity
memvid verify knowledge.mv2 --deep

# After many deletions: Vacuum and rebuild
memvid doctor knowledge.mv2 --vacuum --rebuild-lex-index

# After crashes: Full repair
memvid doctor knowledge.mv2 \
  --rebuild-time-index \
  --rebuild-lex-index \
  --rebuild-vec-index

Next Steps