Overview
Embeddings convert text into dense numerical vectors that capture semantic meaning. Similar concepts produce similar vectors, enabling semantic search (finding documents by meaning rather than exact keywords).| Provider | Model | Dimensions | Best For |
|---|---|---|---|
| Built-in | BGE-small-en-v1.5 | 384 | Offline, privacy-first |
| Ollama | mxbai-embed-large | 1024 | Local, high quality |
| Ollama | nomic-embed-text | 768 | Local, fast |
| OpenAI | text-embedding-3-small | 1536 | General purpose |
| OpenAI | text-embedding-3-large | 3072 | Highest quality |
| Cohere | embed-english-v3.0 | 1024 | English documents |
| Cohere | embed-multilingual-v3.0 | 1024 | Multi-language |
| Voyage | voyage-3 | 1024 | Code and technical docs |
Built-in Model (Default)
By default, Memvid uses BGE-small-en-v1.5, a lightweight embedding model that runs locally without any API keys.Characteristics
- Dimensions: 384
- Size: ~75 MB (downloaded on first use)
- Inference: CPU-based, no GPU required
- Privacy: All processing happens locally
- Offline: Works without internet after initial download
Usage
Ollama Embeddings (Local)
Ollama provides high-quality embeddings that run entirely locally on your machine. No API keys, no data leaving your infrastructure, and no usage costs.Setup
- Install Ollama: ollama.com/download
- Pull an embedding model:
Python SDK
Node.js SDK
Supported Models
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
mxbai-embed-large | 1024 | Medium | Best | Production, high accuracy |
nomic-embed-text | 768 | Fast | Good | General purpose |
bge-m3 | 1024 | Medium | Best | Multilingual |
bge-large | 1024 | Medium | Great | English documents |
snowflake-arctic-embed | 1024 | Medium | Great | Retrieval-focused |
snowflake-arctic-embed:m | 768 | Fast | Good | Balanced |
snowflake-arctic-embed:s | 384 | Fastest | OK | Low latency |
all-minilm | 384 | Fastest | OK | Lightweight |
e5-large | 1024 | Medium | Great | General purpose |
jina-embeddings-v2-base-en | 768 | Fast | Good | Long documents (8K tokens) |
Custom Server
Environment Variables
| Variable | Description |
|---|---|
OLLAMA_HOST | Ollama server URL (default: http://localhost:11434) |
OpenAI Embeddings
OpenAI’s embedding models offer excellent quality for general-purpose semantic search.Setup
CLI Usage
Python SDK
NVIDIA Embeddings
NVIDIA Integrate provides a fast hosted embedding API with OpenAI-compatible shapes.Setup
Python SDK
Node.js SDK
Node.js SDK
Model Comparison
| Model | Dimensions | Cost | Quality |
|---|---|---|---|
text-embedding-3-small | 1536 | $0.02/1M tokens | Good |
text-embedding-3-large | 3072 | $0.13/1M tokens | Best |
text-embedding-ada-002 | 1536 | $0.10/1M tokens | Legacy |
Cohere Embeddings
Cohere offers specialized models for English and multilingual content.Setup
Python SDK
Node.js SDK
Model Options
| Model | Dimensions | Best For |
|---|---|---|
embed-english-v3.0 | 1024 | English documents |
embed-multilingual-v3.0 | 1024 | 100+ languages |
embed-english-light-v3.0 | 384 | Faster, lower cost |
embed-multilingual-light-v3.0 | 384 | Multi-language, lighter |
Voyage Embeddings
Voyage AI specializes in embeddings for code and technical documentation.Setup
Python SDK
Node.js SDK
Model Options
| Model | Dimensions | Best For |
|---|---|---|
voyage-3 | 1024 | General purpose |
voyage-3-lite | 512 | Faster, smaller |
voyage-code-3 | 1024 | Source code |
HuggingFace Embeddings (Python)
Use any HuggingFace sentence-transformer model locally.Setup
Usage
Popular Models
| Model | Dimensions | Size |
|---|---|---|
all-MiniLM-L6-v2 | 384 | 80 MB |
all-mpnet-base-v2 | 768 | 420 MB |
multi-qa-MiniLM-L6-cos-v1 | 384 | 80 MB |
Using External Embeddings with Memvid
The key workflow for external embeddings:- Pick an embedder (OpenAI/Cohere/Voyage/NVIDIA/etc.)
- Ingest with
put_many(..., embedder=...)(stores embedding identity metadata) - Query with
find/ask(..., embedder=...)(keeps dimensions consistent)
Batch Ingestion Example
Vector Compression
For large collections, enable vector compression to reduce storage by ~16x:Environment Variables
| Variable | Description |
|---|---|
OLLAMA_HOST | Ollama server URL (default: http://localhost:11434) |
OPENAI_API_KEY | OpenAI API key |
COHERE_API_KEY | Cohere API key |
VOYAGE_API_KEY | Voyage AI API key |
NVIDIA_API_KEY | NVIDIA Integrate API key |
NVIDIA_BASE_URL | Optional NVIDIA Integrate base URL override |
GOOGLE_API_KEY | Google/Gemini API key |
MISTRAL_API_KEY | Mistral API key |
MEMVID_MODELS_DIR | Local model cache directory |
MEMVID_OFFLINE=1 | Skip model downloads |
Choosing an Embedding Model
Decision Matrix
| Requirement | Recommended |
|---|---|
| Privacy/offline | Ollama mxbai-embed-large |
| Best quality (local) | Ollama mxbai-embed-large |
| Best quality (API) | OpenAI text-embedding-3-large |
| Cost-effective | Ollama (free) or OpenAI text-embedding-3-small |
| Multi-language | Ollama bge-m3 or Cohere embed-multilingual-v3.0 |
| Code/technical | Voyage voyage-code-3 |
| Fastest local | Ollama all-minilm |
| No setup | Built-in BGE-small |
Performance Considerations
- Dimension count affects storage and search speed
- API latency for external providers (batch when possible)
- Rate limits vary by provider plan
- Consistency - use same model for ingestion and search
Reranking
Memvid can rerank retrieved candidates using a cross-encoder model (auto-downloaded on first use). In the CLI this is applied duringask and can be disabled:
find, reranking is handled internally; there is no --rerank flag.