Indices and Tracks

Memvid uses three complementary index types to enable fast, intelligent search across your documents. Each index serves a different purpose and can be enabled or disabled based on your needs.

Index Overview

Index	Engine	Purpose	Best For
Lexical	BM25	Full-text keyword search	Exact terms, error codes, names
Vector	Vector search	Semantic similarity search	Natural language, concepts
Time	Sorted tuples	Chronological ordering	Timeline queries, auditing

All three indices are embedded directly in the .mv2 file. No external dependencies or sidecar files.

Lexical Index

The lexical index powers fast, precise keyword search using BM25, a proven ranking algorithm for full-text search.

How It Works

BM25 ranking: Scores documents by term frequency and inverse document frequency
Tokenization: Breaks text into searchable terms
Memory-mapped: Uses mmap for efficient disk access
Embedded: Stored as a snapshot inside the .mv2 file

When to Use

Lexical search excels at finding exact matches:

# Find exact error codes
memvid find knowledge.mv2 --query "ERR_CONNECTION_REFUSED" --mode lex

# Find function names
memvid find knowledge.mv2 --query "handleAuthentication" --mode lex

# Date range queries
memvid find knowledge.mv2 --query "date:[2024-01-01 TO 2024-12-31]" --mode lex

Building the Index

The lexical index is built automatically when you add documents. You can also rebuild it:

# Rebuild lexical index
memvid doctor knowledge.mv2 --rebuild-lex-index

# Check index status
memvid stats knowledge.mv2 --json | grep has_lex_index

Disabling Lexical Index

For vector-only workloads, you can disable lexical indexing:

# Create without lexical index
memvid create knowledge.mv2 --no-lex

# Python SDK
mem = use('basic', 'knowledge.mv2', enable_lex=False)

Vector Index

The vector index enables semantic search, finding documents by meaning rather than exact keywords.

How It Works

Embeddings: Documents are converted to dense vectors (default: BGE-small, 384 dimensions)
External providers: Support for OpenAI, Cohere, Voyage, and HuggingFace models
Vector graph: Fast approximate nearest neighbor search for semantic similarity
Product Quantization (PQ): Optional 16x compression for large collections
Embedded: Stored as segments inside the .mv2 file

Embedding Model Options

Model	Dimensions	Description
BGE-small (default)	384	Built-in, offline, no API key
OpenAI text-embedding-3-small	1536	High quality, general purpose
OpenAI text-embedding-3-large	3072	Highest quality
Cohere embed-english-v3.0	1024	English documents
Voyage voyage-3	1024	Code and technical docs

See Embedding Models for detailed configuration.

When to Use

Vector search excels at understanding intent:

# Natural language questions
memvid find knowledge.mv2 --query "how do users log in" --mode sem

# Conceptual queries
memvid find knowledge.mv2 --query "best practices for security" --mode sem

# Find similar content
memvid find knowledge.mv2 --query "machine learning model training" --mode sem

Building the Index

Enable embeddings when adding documents:

# Add with embeddings
memvid put knowledge.mv2 --input document.pdf --vector-compression

# Add with compression (16x smaller vectors)
memvid put knowledge.mv2 --input document.pdf --vector-compression

# Python SDK
mem.put(text="Content", title="Doc", enable_embedding=True)

# With compression
mem.put(text="Content", title="Doc", enable_embedding=True, vector_compression=True)

Rebuilding the Index

If vector search isn’t working correctly:

# Rebuild vector index
memvid doctor knowledge.mv2 --rebuild-vec-index

# Check index status
memvid stats knowledge.mv2 --json | grep has_vec_index

Direct Vector Search

For custom embeddings from your own model:

# Search with pre-computed vector
memvid vec-search knowledge.mv2 --vector "0.1,0.2,0.3,..." --limit 10

# Search with embedding file
memvid vec-search knowledge.mv2 --embedding ./query-embedding.json --limit 5

Time Index

The time index enables chronological queries and time-travel features.

How It Works

Sorted tuples: Stores (timestamp, frame_id) pairs in sorted order
MVTI magic: Identified by MVTI header bytes
O(log n) lookups: Binary search for efficient time range queries
Checksummed: Protected by integrity verification

When to Use

Time-based access patterns:

# Browse recent documents
memvid timeline knowledge.mv2 --limit 20

# Filter by time range
memvid timeline knowledge.mv2 --since 1704067200 --until 1706745600

# Reverse chronological order
memvid timeline knowledge.mv2 --reverse

Time-Travel Queries

View your memory as it existed at a point in time:

# Search as of a specific frame
memvid find knowledge.mv2 --query "config" --as-of-frame 100

# Search as of a specific timestamp
memvid find knowledge.mv2 --query "config" --as-of-ts 1704067200

# Timeline at a specific frame
memvid timeline knowledge.mv2 --as-of-frame 50

# Python SDK time-travel
results = mem.find('config', as_of_frame=100)
results = mem.find('config', as_of_ts=1704067200)

Rebuilding the Time Index

If timeline queries return incorrect results:

# Rebuild time index
memvid doctor knowledge.mv2 --rebuild-time-index

# Verify time index
memvid verify knowledge.mv2 --deep

Hybrid Search

Hybrid search (mode auto) combines lexical and semantic results for the best of both worlds.

How It Works

Parallel query: Both lexical and vector indices are queried
Result fusion: Scores are combined using reciprocal rank fusion
Reranking: Top results are reranked for relevance
Deduplication: Duplicate frames are merged

When to Use

Hybrid search is recommended for most use cases:

# Default mode is hybrid
memvid find knowledge.mv2 --query "authentication best practices"

# Explicit hybrid mode
memvid find knowledge.mv2 --query "OAuth2 patterns" --mode auto

Performance Comparison

Mode	Speed	Recall	Best For
`lex`	Fastest	Exact matches	Technical terms, IDs
`sem`	Moderate	Semantic similarity	Natural language
`auto`	Balanced	Comprehensive	General queries

Tracks

Tracks are logical groupings for organizing content within a memory.

What Tracks Are

Namespace: Group related documents together
Filterable: Search within specific tracks
Metadata: Organizational label stored with each frame

Using Tracks

# Add to a specific track
memvid put knowledge.mv2 --input api-docs.md --track "api"
memvid put knowledge.mv2 --input meeting-notes.md --track "meetings"

# Search within a track (via scope)
memvid find knowledge.mv2 --query "authentication" --scope "mv2://api/"

# Python SDK
mem.put(text="API documentation", title="Auth", track="api")
mem.put(text="Meeting notes", title="Standup", track="meetings")

# Search within scope
results = mem.find('authentication', scope='mv2://api/')

Common Track Patterns

Track	Use Case
`documentation`	Technical docs and guides
`code`	Source code and snippets
`meetings`	Meeting notes and transcripts
`research`	Papers and references
`archived`	Old or deprecated content

Index Statistics

Check the status of all indices:

memvid stats knowledge.mv2 --json

{
  "frame_count": 150,
  "has_lex_index": true,
  "has_vec_index": true,
  "has_time_index": true,
  "lex_index_bytes": 2202009,
  "vec_index_bytes": 1887436,
  "time_index_bytes": 310478
}

Best Practices

Index Selection

Scenario	Recommended Indices
Full-featured search	All three (default)
Keyword-only search	Lexical only
Semantic similarity	Vector only
Large collections	All with vector compression
Audit/compliance	Time index required

Performance Tips

Use put_many() for batch ingestion: 100-200x faster than individual put() calls
Enable vector compression for large collections to reduce storage
Rebuild indices if search quality degrades after crashes
Use hybrid mode for best recall on general queries

Maintenance

Regular index maintenance keeps search performing well:

# Weekly: Verify integrity
memvid verify knowledge.mv2 --deep

# After many deletions: Vacuum and rebuild
memvid doctor knowledge.mv2 --vacuum --rebuild-lex-index

# After crashes: Full repair
memvid doctor knowledge.mv2 \
  --rebuild-time-index \
  --rebuild-lex-index \
  --rebuild-vec-index

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​Index Overview

​Lexical Index

​How It Works

​When to Use

​Building the Index

​Disabling Lexical Index

​Vector Index

​How It Works

​Embedding Model Options

​When to Use

​Building the Index

​Rebuilding the Index

​Direct Vector Search

​Time Index

​How It Works

​When to Use

​Time-Travel Queries

​Rebuilding the Time Index

​Hybrid Search

​How It Works

​When to Use

​Performance Comparison

​Tracks

​What Tracks Are

​Using Tracks

​Common Track Patterns

​Index Statistics

​Best Practices

​Index Selection

​Performance Tips

​Maintenance

​Next Steps

Memory Architecture

Search & Ask

Index Overview

Lexical Index

How It Works

When to Use

Building the Index

Disabling Lexical Index

Vector Index

How It Works

Embedding Model Options

When to Use

Building the Index

Rebuilding the Index

Direct Vector Search

Time Index

How It Works

When to Use

Time-Travel Queries

Rebuilding the Time Index

Hybrid Search

How It Works

When to Use

Performance Comparison

Tracks

What Tracks Are

Using Tracks

Common Track Patterns

Index Statistics

Best Practices

Index Selection

Performance Tips

Maintenance

Next Steps