Documentation Index Fetch the complete documentation index at: https://docs.memvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Memvid supports multiple embedding models for semantic (vector) search. You can use the built-in BGE-small model for local, offline operation, or connect to external providers like OpenAI, Cohere, or Voyage for higher-quality embeddings.
Overview
Embeddings convert text into dense numerical vectors that capture semantic meaning. Similar concepts produce similar vectors, enabling semantic search (finding documents by meaning rather than exact keywords).
Provider Model Dimensions Best For Built-in BGE-small-en-v1.5 384 Offline, privacy-first Ollama mxbai-embed-large 1024 Local, high quality Ollama nomic-embed-text 768 Local, fast OpenAI text-embedding-3-small 1536 General purpose OpenAI text-embedding-3-large 3072 Highest quality Cohere embed-english-v3.0 1024 English documents Cohere embed-multilingual-v3.0 1024 Multi-language Voyage voyage-3 1024 Code and technical docs
Built-in Model (Default)
By default, Memvid uses BGE-small-en-v1.5, a lightweight embedding model that runs locally without any API keys.
Characteristics
Dimensions : 384
Size : ~75 MB (downloaded on first use)
Inference : CPU-based, no GPU required
Privacy : All processing happens locally
Offline : Works without internet after initial download
Usage
# CLI: Enable embeddings with built-in model
memvid put knowledge.mv2 --input document.pdf --embedding
# Python SDK
from memvid_sdk import create
mem = create( "knowledge.mv2" , enable_vec = True , enable_lex = True )
mem.put(
"Document" ,
"docs" ,
{},
text = "Your content here" ,
enable_embedding = True ,
embedding_model = "bge-small" ,
)
// Node.js SDK
import { create } from '@memvid/sdk' ;
const mem = await create ( 'knowledge.mv2' );
await mem . put ({ text: 'Your content here' , title: 'Document' , enableEmbedding: true });
Ollama Embeddings (Local)
Ollama provides high-quality embeddings that run entirely locally on your machine. No API keys, no data leaving your infrastructure, and no usage costs.
Setup
Install Ollama: ollama.com/download
Pull an embedding model:
# Recommended: High quality (1024 dimensions)
ollama pull mxbai-embed-large
# Alternative: Faster, smaller (768 dimensions)
ollama pull nomic-embed-text
Python SDK
from memvid_sdk import create
from memvid_sdk.embeddings import OllamaEmbeddings
# Initialize embedder (uses localhost:11434 by default)
embedder = OllamaEmbeddings( model = 'mxbai-embed-large' )
print ( f "Model: { embedder.model_name } ( { embedder.dimension } dimensions)" )
# Create memory with vector index
mem = create( 'knowledge.mv2' , enable_vec = True , enable_lex = True )
# Store with embeddings
documents = [
{ "title" : "Doc 1" , "label" : "kb" , "text" : "Machine learning fundamentals..." },
{ "title" : "Doc 2" , "label" : "kb" , "text" : "Deep neural networks..." },
]
frame_ids = mem.put_many(documents, embedder = embedder)
# Search with query embedding
query = "How do neural networks work?"
results = mem.find(query, k = 5 , mode = "sem" , embedder = embedder)
Node.js SDK
import { create , OllamaEmbeddings } from '@memvid/sdk' ;
// Initialize embedder
const embedder = new OllamaEmbeddings ({ model: 'mxbai-embed-large' });
console . log ( `Model: ${ embedder . modelName } ( ${ embedder . dimension } dimensions)` );
// Create memory
const mem = await create ( 'knowledge.mv2' , 'basic' , { enableLex: true , enableVec: true });
// Ingest with embeddings
for ( const doc of documents ) {
const embedding = await embedder . embedQuery ( doc . text );
await mem . put ({
title: doc . title ,
text: doc . text ,
label: doc . label ,
embedding ,
embeddingIdentity: { provider: 'ollama' , model: 'mxbai-embed-large' , dimension: 1024 },
});
}
await mem . seal ();
// Search
const queryEmbedding = await embedder . embedQuery ( 'How do neural networks work?' );
const results = await mem . find ( 'neural networks' , { k: 5 , mode: 'auto' , queryEmbedding });
Supported Models
Model Dimensions Speed Quality Use Case mxbai-embed-large1024 Medium Best Production, high accuracy nomic-embed-text768 Fast Good General purpose bge-m31024 Medium Best Multilingual bge-large1024 Medium Great English documents snowflake-arctic-embed1024 Medium Great Retrieval-focused snowflake-arctic-embed:m768 Fast Good Balanced snowflake-arctic-embed:s384 Fastest OK Low latency all-minilm384 Fastest OK Lightweight e5-large1024 Medium Great General purpose jina-embeddings-v2-base-en768 Fast Good Long documents (8K tokens)
Custom Server
# Connect to remote Ollama server
embedder = OllamaEmbeddings(
model = 'mxbai-embed-large' ,
base_url = 'http://gpu-server:11434'
)
// Node.js
const embedder = new OllamaEmbeddings ({
model: 'mxbai-embed-large' ,
baseUrl: 'http://gpu-server:11434' ,
});
Environment Variables
Variable Description OLLAMA_HOSTOllama server URL (default: http://localhost:11434)
OpenAI Embeddings
OpenAI’s embedding models offer excellent quality for general-purpose semantic search.
Setup
export OPENAI_API_KEY = sk-your-key-here
CLI Usage
# Use OpenAI for embeddings
memvid put knowledge.mv2 --input document.pdf --embedding -m openai-small
# Specify exact model
memvid put knowledge.mv2 --input docs/ --embedding -m openai-large
Python SDK
from memvid_sdk import create
from memvid_sdk.embeddings import OpenAIEmbeddings
# Initialize embedder
embedder = OpenAIEmbeddings( model = 'text-embedding-3-small' )
print ( f "Model: { embedder.model_name } ( { embedder.dimension } dimensions)" )
# Create memory with vector index
mem = create( 'knowledge.mv2' , enable_vec = True , enable_lex = True )
# Store + embed in batch (vector index required for semantic search)
documents = [
{ "title" : "Doc 1" , "label" : "kb" , "text" : "Machine learning fundamentals..." },
{ "title" : "Doc 2" , "label" : "kb" , "text" : "Deep neural networks..." },
]
frame_ids = mem.put_many(documents, embedder = embedder)
# Search with query embedding
query = "How do neural networks work?"
results = mem.find(query, k = 5 , mode = "sem" , embedder = embedder)
NVIDIA Embeddings
NVIDIA Integrate provides a fast hosted embedding API with OpenAI-compatible shapes.
Setup
export NVIDIA_API_KEY = nvapi-your-key-here
Python SDK
from memvid_sdk import create
from memvid_sdk.embeddings import NvidiaEmbeddings
mem = create( "knowledge.mv2" , enable_vec = True , enable_lex = True )
embedder = NvidiaEmbeddings( model = "nvidia/nv-embed-v1" ) # uses NVIDIA_API_KEY
mem.put_many(
[{ "title" : "Doc" , "label" : "kb" , "text" : "Vector search with NVIDIA embeddings." }],
embedder = embedder,
)
res = mem.find( "nvidia embeddings" , mode = "sem" , embedder = embedder)
Node.js SDK
import { create , NvidiaEmbeddings } from '@memvid/sdk' ;
const mem = await create ( 'knowledge.mv2' );
const embedder = new NvidiaEmbeddings ({ model: 'nvidia/nv-embed-v1' }); // uses NVIDIA_API_KEY
await mem . putMany ([{ title: 'Doc' , text: 'Vector search with NVIDIA embeddings.' }], { embedder });
const res = await mem . find ( 'nvidia embeddings' , { mode: 'sem' , embedder });
Node.js SDK
import { create , OpenAIEmbeddings } from '@memvid/sdk' ;
// Initialize embedder (uses OPENAI_API_KEY env var)
const embedder = new OpenAIEmbeddings ({ model: 'text-embedding-3-small' });
console . log ( `Model: ${ embedder . modelName } ( ${ embedder . dimension } dimensions)` );
// Create memory
const mem = await create ( 'knowledge.mv2' );
// Store + embed in batch (vector index required for semantic search)
await mem . putMany (
[
{ title: 'Doc 1' , text: 'Machine learning fundamentals...' },
{ title: 'Doc 2' , text: 'Deep neural networks...' },
],
{ embedder }
);
await mem . seal ();
// Query using the same embedder (keeps dimensions consistent)
const results = await mem . find ( 'How do neural networks work?' , { mode: 'sem' , k: 5 , embedder });
Model Comparison
Model Dimensions Cost Quality text-embedding-3-small1536 $0.02/1M tokens Good text-embedding-3-large3072 $0.13/1M tokens Best text-embedding-ada-0021536 $0.10/1M tokens Legacy
Cohere Embeddings
Cohere offers specialized models for English and multilingual content.
Setup
export COHERE_API_KEY = your-key-here
Python SDK
from memvid_sdk.embeddings import CohereEmbeddings, get_embedder
# Direct initialization
embedder = CohereEmbeddings( model = 'embed-english-v3.0' )
# Or use factory
embedder = get_embedder( 'cohere' , model = 'embed-multilingual-v3.0' )
# Generate embeddings
embeddings = embedder.embed_documents([ 'Text 1' , 'Text 2' ])
query_vec = embedder.embed_query( 'search query' )
Node.js SDK
import { CohereEmbeddings , getEmbedder } from '@memvid/sdk' ;
// Direct initialization
const embedder = new CohereEmbeddings ({ model: 'embed-english-v3.0' });
// Or use factory
const embedder2 = getEmbedder ( 'cohere' , { model: 'embed-multilingual-v3.0' });
const embeddings = await embedder . embedDocuments ([ 'Text 1' , 'Text 2' ]);
Model Options
Model Dimensions Best For embed-english-v3.01024 English documents embed-multilingual-v3.01024 100+ languages embed-english-light-v3.0384 Faster, lower cost embed-multilingual-light-v3.0384 Multi-language, lighter
Voyage Embeddings
Voyage AI specializes in embeddings for code and technical documentation.
Setup
export VOYAGE_API_KEY = your-key-here
Python SDK
from memvid_sdk.embeddings import VoyageEmbeddings
embedder = VoyageEmbeddings( model = 'voyage-3' )
embeddings = embedder.embed_documents([ 'def hello(): pass' , 'function hello() {} ' ])
Node.js SDK
import { VoyageEmbeddings } from '@memvid/sdk' ;
const embedder = new VoyageEmbeddings ({ model: 'voyage-code-3' });
const embeddings = await embedder . embedDocuments ([ 'def hello(): pass' ]);
Model Options
Model Dimensions Best For voyage-31024 General purpose voyage-3-lite512 Faster, smaller voyage-code-31024 Source code
HuggingFace Embeddings (Python)
Use any HuggingFace sentence-transformer model locally.
Setup
pip install sentence-transformers
Usage
from memvid_sdk.embeddings import get_embedder
# Use any sentence-transformers model
embedder = get_embedder( 'huggingface' , model = 'all-MiniLM-L6-v2' )
print ( f "Model: { embedder.model_name } ( { embedder.dimension } dimensions)" )
embeddings = embedder.embed_documents([ 'Text 1' , 'Text 2' ])
Popular Models
Model Dimensions Size all-MiniLM-L6-v2384 80 MB all-mpnet-base-v2768 420 MB multi-qa-MiniLM-L6-cos-v1384 80 MB
Using External Embeddings with Memvid
The key workflow for external embeddings:
Pick an embedder (OpenAI/Cohere/Voyage/NVIDIA/etc.)
Ingest with put_many(..., embedder=...) (stores embedding identity metadata)
Query with find/ask(..., embedder=...) (keeps dimensions consistent)
Batch Ingestion Example
from memvid_sdk import create
from memvid_sdk.embeddings import OpenAIEmbeddings
# Setup
embedder = OpenAIEmbeddings()
mem = create( 'knowledge.mv2' , enable_vec = True , enable_lex = True )
documents = [
{ "title" : "Doc 1" , "label" : "research" , "text" : "Content 1..." },
{ "title" : "Doc 2" , "label" : "research" , "text" : "Content 2..." },
]
frame_ids = mem.put_many(documents, embedder = embedder)
query = "What is the main finding?"
results = mem.find(query, k = 10 , mode = "sem" , embedder = embedder)
Vector Compression
For large collections, enable vector compression to reduce storage by ~16x:
# CLI
memvid put knowledge.mv2 --input docs/ --embedding --vector-compression
# Python
from memvid_sdk import create
mem = create( "knowledge.mv2" , enable_vec = True , enable_lex = True )
mem.put( "Doc" , "kb" , {}, text = "..." , enable_embedding = True , vector_compression = True )
This uses Product Quantization (PQ) to compress vectors while maintaining search quality.
Environment Variables
Variable Description OLLAMA_HOSTOllama server URL (default: http://localhost:11434) OPENAI_API_KEYOpenAI API key COHERE_API_KEYCohere API key VOYAGE_API_KEYVoyage AI API key NVIDIA_API_KEYNVIDIA Integrate API key NVIDIA_BASE_URLOptional NVIDIA Integrate base URL override GOOGLE_API_KEYGoogle/Gemini API key MISTRAL_API_KEYMistral API key MEMVID_MODELS_DIRLocal model cache directory MEMVID_OFFLINE=1Skip model downloads
Choosing an Embedding Model
Decision Matrix
Requirement Recommended Privacy/offline Ollama mxbai-embed-large Best quality (local) Ollama mxbai-embed-large Best quality (API) OpenAI text-embedding-3-large Cost-effective Ollama (free) or OpenAI text-embedding-3-small Multi-language Ollama bge-m3 or Cohere embed-multilingual-v3.0 Code/technical Voyage voyage-code-3 Fastest local Ollama all-minilm No setup Built-in BGE-small
Dimension count affects storage and search speed
API latency for external providers (batch when possible)
Rate limits vary by provider plan
Consistency - use same model for ingestion and search
Reranking
Memvid can rerank retrieved candidates using a cross-encoder model (auto-downloaded on first use). In the CLI this is applied during ask and can be disabled:
memvid ask knowledge.mv2 --question "What is machine learning?" --mode hybrid --no-rerank
For find, reranking is handled internally; there is no --rerank flag.
Next Steps
Indices and Tracks Learn about lexical, vector, and time indices
Search & Ask Master semantic search queries