- Privacy-sensitive data - Keep everything on your machine
- Offline usage - No internet connection required after setup
- Cost savings - No API fees for inference
- Low latency - No network round-trips
Quick Setup
1. Install Ollama
- macOS
- Linux
- Windows
2. Start Ollama Server
3. Pull a Model
4. Use with Memvid
Recommended Models
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
qwen2.5:0.5b | ~400MB | Fast | Good | Quick queries, limited RAM |
qwen2.5:1.5b | ~1GB | Fast | Great | Recommended default |
qwen2.5:3b | ~2GB | Medium | Excellent | Complex questions |
phi3:mini | ~2GB | Medium | Great | Reasoning tasks |
gemma2:2b | ~1.6GB | Medium | Great | General purpose |
llama3.2:1b | ~1.3GB | Fast | Good | Conversational |
llama3.2:3b | ~2GB | Medium | Great | Balanced performance |
Pull Commands
CLI Usage
Basic Q&A
Advanced Options
Python SDK Usage
With Different Models
Node.js SDK Usage
Model Selection Guide
By Use Case
| Use Case | Recommended Model | Why |
|---|---|---|
| Quick lookups | qwen2.5:0.5b | Fastest response |
| General Q&A | qwen2.5:1.5b | Best balance |
| Technical docs | qwen2.5:3b | Better reasoning |
| Code analysis | phi3:mini | Strong at code |
| Research papers | qwen2.5:3b | Complex content |
By Hardware
| RAM Available | Recommended Model |
|---|---|
| 4GB | qwen2.5:0.5b |
| 8GB | qwen2.5:1.5b |
| 16GB+ | qwen2.5:3b or phi3:mini |
Ollama Management
List Downloaded Models
Remove a Model
Update a Model
Check Ollama Status
Run as Background Service
- macOS
- Linux (systemd)
Troubleshooting
Ollama Not Running
Model Not Found
Slow Response Times
Solutions:- Use a smaller model:
ollama:qwen2.5:0.5b - Reduce context:
--top-k 5 --snippet-chars 300 - Close other memory-intensive applications
- Ensure you have enough RAM for the model
Out of Memory
Solutions:- Use a smaller model
- Close other applications
- Increase swap space (not recommended for performance)
Comparison: Local vs Cloud Models
| Aspect | Local (Ollama) | Cloud (OpenAI, Claude, Gemini) |
|---|---|---|
| Privacy | Data stays local | Data sent to API |
| Cost | Free after setup | Per-token pricing |
| Speed | Depends on hardware | Usually faster |
| Quality | Good to great | Excellent |
| Offline | Yes | No |
| Setup | Requires installation | Just API key |
When to Use Local Models
- Sensitive/confidential data
- Offline environments
- Cost-sensitive applications
- Privacy requirements
When to Use Cloud Models
- Best possible answer quality
- Limited local compute
- Quick prototyping
- Complex reasoning tasks