Skip to main content
Memvid supports multiple embedding models for semantic (vector) search. You can use the built-in BGE-small model for local, offline operation, or connect to external providers like OpenAI, Cohere, or Voyage for higher-quality embeddings.

Overview

Embeddings convert text into dense numerical vectors that capture semantic meaning. Similar concepts produce similar vectors, enabling semantic search (finding documents by meaning rather than exact keywords).
ProviderModelDimensionsBest For
Built-inBGE-small-en-v1.5384Offline, privacy-first
Ollamamxbai-embed-large1024Local, high quality
Ollamanomic-embed-text768Local, fast
OpenAItext-embedding-3-small1536General purpose
OpenAItext-embedding-3-large3072Highest quality
Cohereembed-english-v3.01024English documents
Cohereembed-multilingual-v3.01024Multi-language
Voyagevoyage-31024Code and technical docs

Built-in Model (Default)

By default, Memvid uses BGE-small-en-v1.5, a lightweight embedding model that runs locally without any API keys.

Characteristics

  • Dimensions: 384
  • Size: ~75 MB (downloaded on first use)
  • Inference: CPU-based, no GPU required
  • Privacy: All processing happens locally
  • Offline: Works without internet after initial download

Usage

# CLI: Enable embeddings with built-in model
memvid put knowledge.mv2 --input document.pdf --embedding
# Python SDK
from memvid_sdk import create

mem = create("knowledge.mv2", enable_vec=True, enable_lex=True)
mem.put(
    "Document",
    "docs",
    {},
    text="Your content here",
    enable_embedding=True,
    embedding_model="bge-small",
)
// Node.js SDK
import { create } from '@memvid/sdk';

const mem = await create('knowledge.mv2');
await mem.put({ text: 'Your content here', title: 'Document', enableEmbedding: true });

Ollama Embeddings (Local)

Ollama provides high-quality embeddings that run entirely locally on your machine. No API keys, no data leaving your infrastructure, and no usage costs.

Setup

  1. Install Ollama: ollama.com/download
  2. Pull an embedding model:
# Recommended: High quality (1024 dimensions)
ollama pull mxbai-embed-large

# Alternative: Faster, smaller (768 dimensions)
ollama pull nomic-embed-text

Python SDK

from memvid_sdk import create
from memvid_sdk.embeddings import OllamaEmbeddings

# Initialize embedder (uses localhost:11434 by default)
embedder = OllamaEmbeddings(model='mxbai-embed-large')
print(f"Model: {embedder.model_name} ({embedder.dimension} dimensions)")

# Create memory with vector index
mem = create('knowledge.mv2', enable_vec=True, enable_lex=True)

# Store with embeddings
documents = [
    {"title": "Doc 1", "label": "kb", "text": "Machine learning fundamentals..."},
    {"title": "Doc 2", "label": "kb", "text": "Deep neural networks..."},
]
frame_ids = mem.put_many(documents, embedder=embedder)

# Search with query embedding
query = "How do neural networks work?"
results = mem.find(query, k=5, mode="sem", embedder=embedder)

Node.js SDK

import { create, OllamaEmbeddings } from '@memvid/sdk';

// Initialize embedder
const embedder = new OllamaEmbeddings({ model: 'mxbai-embed-large' });
console.log(`Model: ${embedder.modelName} (${embedder.dimension} dimensions)`);

// Create memory
const mem = await create('knowledge.mv2', 'basic', { enableLex: true, enableVec: true });

// Ingest with embeddings
for (const doc of documents) {
  const embedding = await embedder.embedQuery(doc.text);
  await mem.put({
    title: doc.title,
    text: doc.text,
    label: doc.label,
    embedding,
    embeddingIdentity: { provider: 'ollama', model: 'mxbai-embed-large', dimension: 1024 },
  });
}
await mem.seal();

// Search
const queryEmbedding = await embedder.embedQuery('How do neural networks work?');
const results = await mem.find('neural networks', { k: 5, mode: 'auto', queryEmbedding });

Supported Models

ModelDimensionsSpeedQualityUse Case
mxbai-embed-large1024MediumBestProduction, high accuracy
nomic-embed-text768FastGoodGeneral purpose
bge-m31024MediumBestMultilingual
bge-large1024MediumGreatEnglish documents
snowflake-arctic-embed1024MediumGreatRetrieval-focused
snowflake-arctic-embed:m768FastGoodBalanced
snowflake-arctic-embed:s384FastestOKLow latency
all-minilm384FastestOKLightweight
e5-large1024MediumGreatGeneral purpose
jina-embeddings-v2-base-en768FastGoodLong documents (8K tokens)

Custom Server

# Connect to remote Ollama server
embedder = OllamaEmbeddings(
    model='mxbai-embed-large',
    base_url='http://gpu-server:11434'
)
// Node.js
const embedder = new OllamaEmbeddings({
  model: 'mxbai-embed-large',
  baseUrl: 'http://gpu-server:11434',
});

Environment Variables

VariableDescription
OLLAMA_HOSTOllama server URL (default: http://localhost:11434)

OpenAI Embeddings

OpenAI’s embedding models offer excellent quality for general-purpose semantic search.

Setup

export OPENAI_API_KEY=sk-your-key-here

CLI Usage

# Use OpenAI for embeddings
memvid put knowledge.mv2 --input document.pdf --embedding -m openai-small

# Specify exact model
memvid put knowledge.mv2 --input docs/ --embedding -m openai-large

Python SDK

from memvid_sdk import create
from memvid_sdk.embeddings import OpenAIEmbeddings

# Initialize embedder
embedder = OpenAIEmbeddings(model='text-embedding-3-small')
print(f"Model: {embedder.model_name} ({embedder.dimension} dimensions)")

# Create memory with vector index
mem = create('knowledge.mv2', enable_vec=True, enable_lex=True)

# Store + embed in batch (vector index required for semantic search)
documents = [
    {"title": "Doc 1", "label": "kb", "text": "Machine learning fundamentals..."},
    {"title": "Doc 2", "label": "kb", "text": "Deep neural networks..."},
]
frame_ids = mem.put_many(documents, embedder=embedder)

# Search with query embedding
query = "How do neural networks work?"
results = mem.find(query, k=5, mode="sem", embedder=embedder)

NVIDIA Embeddings

NVIDIA Integrate provides a fast hosted embedding API with OpenAI-compatible shapes.

Setup

export NVIDIA_API_KEY=nvapi-your-key-here

Python SDK

from memvid_sdk import create
from memvid_sdk.embeddings import NvidiaEmbeddings

mem = create("knowledge.mv2", enable_vec=True, enable_lex=True)
embedder = NvidiaEmbeddings(model="nvidia/nv-embed-v1")  # uses NVIDIA_API_KEY

mem.put_many(
    [{"title": "Doc", "label": "kb", "text": "Vector search with NVIDIA embeddings."}],
    embedder=embedder,
)
res = mem.find("nvidia embeddings", mode="sem", embedder=embedder)

Node.js SDK

import { create, NvidiaEmbeddings } from '@memvid/sdk';

const mem = await create('knowledge.mv2');
const embedder = new NvidiaEmbeddings({ model: 'nvidia/nv-embed-v1' }); // uses NVIDIA_API_KEY
await mem.putMany([{ title: 'Doc', text: 'Vector search with NVIDIA embeddings.' }], { embedder });
const res = await mem.find('nvidia embeddings', { mode: 'sem', embedder });

Node.js SDK

import { create, OpenAIEmbeddings } from '@memvid/sdk';

// Initialize embedder (uses OPENAI_API_KEY env var)
const embedder = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
console.log(`Model: ${embedder.modelName} (${embedder.dimension} dimensions)`);

// Create memory
const mem = await create('knowledge.mv2');

// Store + embed in batch (vector index required for semantic search)
await mem.putMany(
  [
    { title: 'Doc 1', text: 'Machine learning fundamentals...' },
    { title: 'Doc 2', text: 'Deep neural networks...' },
  ],
  { embedder }
);
await mem.seal();

// Query using the same embedder (keeps dimensions consistent)
const results = await mem.find('How do neural networks work?', { mode: 'sem', k: 5, embedder });

Model Comparison

ModelDimensionsCostQuality
text-embedding-3-small1536$0.02/1M tokensGood
text-embedding-3-large3072$0.13/1M tokensBest
text-embedding-ada-0021536$0.10/1M tokensLegacy

Cohere Embeddings

Cohere offers specialized models for English and multilingual content.

Setup

export COHERE_API_KEY=your-key-here

Python SDK

from memvid_sdk.embeddings import CohereEmbeddings, get_embedder

# Direct initialization
embedder = CohereEmbeddings(model='embed-english-v3.0')

# Or use factory
embedder = get_embedder('cohere', model='embed-multilingual-v3.0')

# Generate embeddings
embeddings = embedder.embed_documents(['Text 1', 'Text 2'])
query_vec = embedder.embed_query('search query')

Node.js SDK

import { CohereEmbeddings, getEmbedder } from '@memvid/sdk';

// Direct initialization
const embedder = new CohereEmbeddings({ model: 'embed-english-v3.0' });

// Or use factory
const embedder2 = getEmbedder('cohere', { model: 'embed-multilingual-v3.0' });

const embeddings = await embedder.embedDocuments(['Text 1', 'Text 2']);

Model Options

ModelDimensionsBest For
embed-english-v3.01024English documents
embed-multilingual-v3.01024100+ languages
embed-english-light-v3.0384Faster, lower cost
embed-multilingual-light-v3.0384Multi-language, lighter

Voyage Embeddings

Voyage AI specializes in embeddings for code and technical documentation.

Setup

export VOYAGE_API_KEY=your-key-here

Python SDK

from memvid_sdk.embeddings import VoyageEmbeddings

embedder = VoyageEmbeddings(model='voyage-3')
embeddings = embedder.embed_documents(['def hello(): pass', 'function hello() {}'])

Node.js SDK

import { VoyageEmbeddings } from '@memvid/sdk';

const embedder = new VoyageEmbeddings({ model: 'voyage-code-3' });
const embeddings = await embedder.embedDocuments(['def hello(): pass']);

Model Options

ModelDimensionsBest For
voyage-31024General purpose
voyage-3-lite512Faster, smaller
voyage-code-31024Source code

HuggingFace Embeddings (Python)

Use any HuggingFace sentence-transformer model locally.

Setup

pip install sentence-transformers

Usage

from memvid_sdk.embeddings import get_embedder

# Use any sentence-transformers model
embedder = get_embedder('huggingface', model='all-MiniLM-L6-v2')
print(f"Model: {embedder.model_name} ({embedder.dimension} dimensions)")

embeddings = embedder.embed_documents(['Text 1', 'Text 2'])
ModelDimensionsSize
all-MiniLM-L6-v238480 MB
all-mpnet-base-v2768420 MB
multi-qa-MiniLM-L6-cos-v138480 MB

Using External Embeddings with Memvid

The key workflow for external embeddings:
  1. Pick an embedder (OpenAI/Cohere/Voyage/NVIDIA/etc.)
  2. Ingest with put_many(..., embedder=...) (stores embedding identity metadata)
  3. Query with find/ask(..., embedder=...) (keeps dimensions consistent)

Batch Ingestion Example

from memvid_sdk import create
from memvid_sdk.embeddings import OpenAIEmbeddings

# Setup
embedder = OpenAIEmbeddings()
mem = create('knowledge.mv2', enable_vec=True, enable_lex=True)

documents = [
    {"title": "Doc 1", "label": "research", "text": "Content 1..."},
    {"title": "Doc 2", "label": "research", "text": "Content 2..."},
]
frame_ids = mem.put_many(documents, embedder=embedder)

query = "What is the main finding?"
results = mem.find(query, k=10, mode="sem", embedder=embedder)

Vector Compression

For large collections, enable vector compression to reduce storage by ~16x:
# CLI
memvid put knowledge.mv2 --input docs/ --embedding --vector-compression
# Python
from memvid_sdk import create

mem = create("knowledge.mv2", enable_vec=True, enable_lex=True)
mem.put("Doc", "kb", {}, text="...", enable_embedding=True, vector_compression=True)
This uses Product Quantization (PQ) to compress vectors while maintaining search quality.

Environment Variables

VariableDescription
OLLAMA_HOSTOllama server URL (default: http://localhost:11434)
OPENAI_API_KEYOpenAI API key
COHERE_API_KEYCohere API key
VOYAGE_API_KEYVoyage AI API key
NVIDIA_API_KEYNVIDIA Integrate API key
NVIDIA_BASE_URLOptional NVIDIA Integrate base URL override
GOOGLE_API_KEYGoogle/Gemini API key
MISTRAL_API_KEYMistral API key
MEMVID_MODELS_DIRLocal model cache directory
MEMVID_OFFLINE=1Skip model downloads

Choosing an Embedding Model

Decision Matrix

RequirementRecommended
Privacy/offlineOllama mxbai-embed-large
Best quality (local)Ollama mxbai-embed-large
Best quality (API)OpenAI text-embedding-3-large
Cost-effectiveOllama (free) or OpenAI text-embedding-3-small
Multi-languageOllama bge-m3 or Cohere embed-multilingual-v3.0
Code/technicalVoyage voyage-code-3
Fastest localOllama all-minilm
No setupBuilt-in BGE-small

Performance Considerations

  • Dimension count affects storage and search speed
  • API latency for external providers (batch when possible)
  • Rate limits vary by provider plan
  • Consistency - use same model for ingestion and search

Reranking

Memvid can rerank retrieved candidates using a cross-encoder model (auto-downloaded on first use). In the CLI this is applied during ask and can be disabled:
memvid ask knowledge.mv2 --question "What is machine learning?" --mode hybrid --no-rerank
For find, reranking is handled internally; there is no --rerank flag.

Next Steps