Local Models with Ollama

Memvid supports local LLM inference through Ollama, allowing you to run AI-powered Q&A without sending your data to external APIs. This is ideal for:

Privacy-sensitive data - Keep everything on your machine
Offline usage - No internet connection required after setup
Cost savings - No API fees for inference
Low latency - No network round-trips

Quick Setup

1. Install Ollama

macOS
Linux
Windows

brew install ollama

curl -fsSL https://ollama.com/install.sh | sh

2. Start Ollama Server

# Start in foreground (see logs)
ollama serve

# Or run as background service (macOS)
brew services start ollama

3. Pull a Model

# Recommended: Qwen2.5 1.5B (best quality/size ratio)
ollama pull qwen2.5:1.5b

4. Use with Memvid

memvid ask knowledge.mv2 \
  --question "What is the main topic?" \
  --use-model "ollama:qwen2.5:1.5b"

Recommended Models

Model	Size	Speed	Quality	Best For
`qwen2.5:0.5b`	~400MB	Fast	Good	Quick queries, limited RAM
`qwen2.5:1.5b`	~1GB	Fast	Great	Recommended default
`qwen2.5:3b`	~2GB	Medium	Excellent	Complex questions
`phi3:mini`	~2GB	Medium	Great	Reasoning tasks
`gemma2:2b`	~1.6GB	Medium	Great	General purpose
`llama3.2:1b`	~1.3GB	Fast	Good	Conversational
`llama3.2:3b`	~2GB	Medium	Great	Balanced performance

Pull Commands

# Small & fast
ollama pull qwen2.5:0.5b

# Recommended (best balance)
ollama pull qwen2.5:1.5b

# Higher quality
ollama pull qwen2.5:3b
ollama pull phi3:mini
ollama pull gemma2:2b

# Meta's Llama
ollama pull llama3.2:1b
ollama pull llama3.2:3b

CLI Usage

Basic Q&A

# Ask with local model
memvid ask knowledge.mv2 \
  --question "What are the key findings?" \
  --use-model "ollama:qwen2.5:1.5b"

# With JSON output
memvid ask knowledge.mv2 \
  --question "Summarize the main points" \
  --use-model "ollama:qwen2.5:1.5b" \
  --json

Advanced Options

# More context for complex questions
memvid ask knowledge.mv2 \
  --question "Explain the architecture in detail" \
  --use-model "ollama:qwen2.5:3b" \
  --top-k 15 \
  --snippet-chars 800

# Filter by scope
memvid ask knowledge.mv2 \
  --question "What API endpoints exist?" \
  --use-model "ollama:qwen2.5:1.5b" \
  --scope "mv2://api/"

# Time-travel query
memvid ask knowledge.mv2 \
  --question "What was the status?" \
  --use-model "ollama:qwen2.5:1.5b" \
  --as-of-frame 100

Python SDK Usage

from memvid_sdk import use

mem = use('basic', 'knowledge.mv2')

# Ask with local Ollama model
response = mem.ask(
    "What are the main conclusions?",
    model="ollama:qwen2.5:1.5b",
    k=10
)

print(response['answer'])

With Different Models

# Quick answer with small model
quick_response = mem.ask(
    "What is this document about?",
    model="ollama:qwen2.5:0.5b"
)

# Detailed analysis with larger model
detailed_response = mem.ask(
    "Provide a comprehensive analysis of the findings",
    model="ollama:qwen2.5:3b",
    k=15
)

Node.js SDK Usage

import { use } from '@memvid/sdk';

const mem = await use('basic', 'knowledge.mv2');

// Ask with local Ollama model
const response = await mem.ask(
  'What are the key takeaways?',
  { model: 'ollama:qwen2.5:1.5b', k: 10 }
);

console.log(response.answer);

Model Selection Guide

By Use Case

Use Case	Recommended Model	Why
Quick lookups	`qwen2.5:0.5b`	Fastest response
General Q&A	`qwen2.5:1.5b`	Best balance
Technical docs	`qwen2.5:3b`	Better reasoning
Code analysis	`phi3:mini`	Strong at code
Research papers	`qwen2.5:3b`	Complex content

By Hardware

RAM Available	Recommended Model
4GB	`qwen2.5:0.5b`
8GB	`qwen2.5:1.5b`
16GB+	`qwen2.5:3b` or `phi3:mini`

Ollama Management

List Downloaded Models

ollama list

Remove a Model

ollama rm qwen2.5:0.5b

Update a Model

ollama pull qwen2.5:1.5b

Check Ollama Status

# Check if server is running
curl http://localhost:11434/api/tags

Run as Background Service

macOS
Linux (systemd)

# Start service
brew services start ollama

# Stop service
brew services stop ollama

# Check status
brew services list | grep ollama

# Start service
sudo systemctl start ollama

# Enable on boot
sudo systemctl enable ollama

# Check status
sudo systemctl status ollama

Troubleshooting

Ollama Not Running

Error: Failed to contact LLM provider

Solution:

# Start Ollama server
ollama serve

# Or as background service (macOS)
brew services start ollama

Model Not Found

Error: model 'qwen2.5:1.5b' not found

Solution:

# Pull the model first
ollama pull qwen2.5:1.5b

Slow Response Times

Solutions:

Use a smaller model: ollama:qwen2.5:0.5b
Reduce context: --top-k 5 --snippet-chars 300
Close other memory-intensive applications
Ensure you have enough RAM for the model

Out of Memory

Solutions:

Use a smaller model
Close other applications
Increase swap space (not recommended for performance)

Comparison: Local vs Cloud Models

Aspect	Local (Ollama)	Cloud (OpenAI, Claude, Gemini)
Privacy	Data stays local	Data sent to API
Cost	Free after setup	Per-token pricing
Speed	Depends on hardware	Usually faster
Quality	Good to great	Excellent
Offline	Yes	No
Setup	Requires installation	Just API key

When to Use Local Models

Sensitive/confidential data
Offline environments
Cost-sensitive applications
Privacy requirements

When to Use Cloud Models

Best possible answer quality
Limited local compute
Quick prototyping
Complex reasoning tasks

Next Steps

Search & Ask CLI

Full CLI reference for search and Q&A commands

Python SDK

Use local models in Python applications

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​Quick Setup

​1. Install Ollama

​2. Start Ollama Server

​3. Pull a Model

​4. Use with Memvid

​Recommended Models

​Pull Commands

​CLI Usage

​Basic Q&A

​Advanced Options

​Python SDK Usage

​With Different Models

​Node.js SDK Usage

​Model Selection Guide

​By Use Case

​By Hardware

​Ollama Management

​List Downloaded Models

​Remove a Model

​Update a Model

​Check Ollama Status

​Run as Background Service

​Troubleshooting

​Ollama Not Running

​Model Not Found

​Slow Response Times

​Out of Memory

​Comparison: Local vs Cloud Models

​When to Use Local Models

​When to Use Cloud Models

​Next Steps

Search & Ask CLI

Python SDK

Quick Setup

1. Install Ollama

2. Start Ollama Server

3. Pull a Model

4. Use with Memvid

Recommended Models

Pull Commands

CLI Usage

Basic Q&A

Advanced Options

Python SDK Usage

With Different Models

Node.js SDK Usage

Model Selection Guide

By Use Case

By Hardware

Ollama Management

List Downloaded Models

Remove a Model

Update a Model

Check Ollama Status

Run as Background Service

Troubleshooting

Ollama Not Running

Model Not Found

Slow Response Times

Out of Memory

Comparison: Local vs Cloud Models

When to Use Local Models

When to Use Cloud Models

Next Steps