Learn how to create new memory files and ingest documents using the Memvid CLI.
Creating a Memory File
Basic Usage
Create a new .mv2 memory file:
memvid create my-knowledge.mv2
Options
| Option | Description | Default |
|---|
--tier | Capacity tier (free, dev, enterprise) | free |
--size | Capacity override (e.g. 15MB, capped at 50MB) | 50MB |
--no-lex | Disable lexical/full-text index | Enabled |
--no-vector | Disable vector index | Enabled |
Examples
# Create a basic memory file
memvid create research.mv2
# Create without lexical index
memvid create notes.mv2 --no-lex
# Create a smaller memory (capacity override)
memvid create small.mv2 --size 512MB
memvid create is capped at 50MB. To go beyond 50MB, create the file and then apply a signed capacity ticket (see memvid tickets sync/apply).
JSON Output
memvid create my-memory.mv2 --json
{
"path": "my-memory.mv2",
"size_limit_bytes": 536870912,
"lex_enabled": true,
"vec_enabled": true,
"created_at": "2024-01-15T10:30:00Z"
}
Inspecting a Memory File
The open command shows metadata and manifests of an existing memory file.
Synopsis
memvid open <FILE> [OPTIONS]
Options
| Option | Description |
|---|
--json | Emit JSON output |
Examples
# Inspect a memory file
memvid open my-memory.mv2
# Get JSON output for scripting
memvid open my-memory.mv2 --json
Response
Memory File: my-memory.mv2
Version: 2.1.0
Created: 2024-01-15T10:30:00Z
Frames: 1,234
Size: 45.2 MB / 512 MB (8.8%)
Indexes:
Lexical: enabled (12,456 terms)
Vector: enabled (1,234 vectors, 384d)
Time: enabled (1,234 entries)
Tracks:
default: 890 frames
meetings: 234 frames
emails: 110 frames
Memory Binding:
Memory ID: mem_abc123
Bound at: 2024-01-15T10:30:00Z
JSON Output
{
"path": "my-memory.mv2",
"version": "2.1.0",
"created_at": "2024-01-15T10:30:00Z",
"frame_count": 1234,
"size_bytes": 47395430,
"size_limit_bytes": 536870912,
"indexes": {
"lex": { "enabled": true, "term_count": 12456 },
"vec": { "enabled": true, "vector_count": 1234, "dimension": 384 },
"time": { "enabled": true, "entry_count": 1234 }
},
"tracks": {
"default": 890,
"meetings": 234,
"emails": 110
},
"binding": {
"memory_id": "mem_abc123",
"bound_at": "2024-01-15T10:30:00Z"
}
}
Ingesting Documents
The put command adds documents to your memory file as frames.
Basic Usage
# Ingest a single file (text-only)
memvid put my-knowledge.mv2 --input document.pdf
# Ingest a directory
memvid put my-knowledge.mv2 --input ./documents/
# Ingest with semantic embeddings (+16x PQ compression)
memvid put my-knowledge.mv2 --input document.pdf --embedding --vector-compression
# Ingest from stdin (text-only by default)
echo "Some text content" | memvid put my-knowledge.mv2
Core Options
| Option | Description |
|---|
--input PATH | Path to file or directory |
--uri URI | Custom URI for the frame |
--title TITLE | Document title |
--timestamp UNIX_TS | POSIX timestamp |
--track TRACK | Track/collection name |
--kind KIND | Content type metadata |
--json | Output as JSON |
| Option | Description |
|---|
--tag KEY=VALUE | Add tags (repeatable) |
--label LABEL | Add labels (repeatable) |
--metadata JSON | Additional metadata as JSON |
--no-auto-tag | Disable automatic tag extraction |
--no-extract-dates | Disable date extraction |
When the CLIP and NER models are installed, the CLI automatically enables visual embeddings for images/PDFs and entity extraction.
| Option | Description |
|---|
--clip | Explicitly enable CLIP visual embeddings |
--no-clip | Disable CLIP even when model is available |
--logic-mesh | Explicitly enable entity extraction |
--no-logic-mesh | Disable entity extraction even when model is available |
Install models manually:
memvid models install --clip mobileclip-s2
memvid models install --ner distilbert-ner
Embedding Options
| Option | Description |
|---|
--embedding | Enable semantic embeddings |
-m, --embedding-model MODEL | Choose default embedding model (global flag; see below) |
--vector-compression | Generate semantic embeddings with 16x compression |
--no-embedding | Explicitly disable embeddings |
Embedding Model Options:
| Model | Description |
|---|
bge-small | Local fastembed default (384d) |
bge-base | Local higher quality (768d) |
nomic | Local high accuracy (768d) |
gte-large | Local best semantic depth (1024d) |
openai-small | OpenAI text-embedding-3-small (1536d) |
openai-large | OpenAI text-embedding-3-large (3072d) |
openai | Alias for openai-large |
openai-ada | OpenAI text-embedding-ada-002 (1536d, legacy) |
# Use built-in BGE (default, no API key needed)
memvid put knowledge.mv2 --input docs/ --embedding
# Use OpenAI embeddings
export OPENAI_API_KEY=sk-...
memvid put knowledge.mv2 --input docs/ --embedding -m openai-small
# Use OpenAI large model for higher quality
memvid put knowledge.mv2 --input docs/ --embedding -m openai-large
| Option | Description |
|---|
--tables | Extract tables from PDF files |
--embed-rows | Embed individual table rows for semantic search (default: true) |
Duplicate Handling
| Option | Description |
|---|
--update-existing | Replace existing frame with same URI |
--allow-duplicate | Allow multiple frames with same URI |
Lock Control
| Option | Description | Default |
|---|
--lock-timeout MS | Wait time for lock | 250ms |
--force | Force takeover of stale lock | false |
Ingesting Different File Types
Memvid automatically detects and processes various file formats:
Text Files
Documents
Media
# Plain text
memvid put knowledge.mv2 --input notes.txt --vector-compression
# Markdown
memvid put knowledge.mv2 --input README.md --vector-compression
# HTML
memvid put knowledge.mv2 --input page.html --vector-compression
# PDF files
memvid put knowledge.mv2 --input report.pdf --vector-compression
# PDF with table extraction
memvid put knowledge.mv2 --input invoice.pdf --tables --vector-compression
# Word documents
memvid put knowledge.mv2 --input document.docx --vector-compression
# Excel spreadsheets
memvid put knowledge.mv2 --input data.xlsx --vector-compression
# PowerPoint presentations
memvid put knowledge.mv2 --input slides.pptx --vector-compression
# Images with EXIF extraction
memvid put knowledge.mv2 --input photo.jpg
# Audio files
memvid put knowledge.mv2 --input recording.mp3 --audio
# Video files (stored without transcoding)
memvid put knowledge.mv2 --input video.mp4 --video
Organize your documents with tracks, tags, and timestamps:
# Add to a specific track
memvid put knowledge.mv2 --input meeting-notes.md --vector-compression --track "meetings"
# Add metadata tags
memvid put knowledge.mv2 --input api-docs.md --vector-compression \
--tag "category=documentation" \
--tag "version=2.0" \
--tag "author=team"
# Add labels
memvid put knowledge.mv2 --input report.pdf --vector-compression \
--label "quarterly" \
--label "finance"
# Set custom timestamp
memvid put knowledge.mv2 --input old-report.pdf --vector-compression \
--timestamp 1686819000
# Combine options
memvid put knowledge.mv2 --input quarterly-report.pdf --vector-compression \
--track "reports" \
--title "Q3 2024 Report" \
--tag "quarter=Q3" \
--tag "year=2024"
Parallel Ingestion
For large datasets, enable multi-threaded processing:
# Enable parallel ingestion
memvid put knowledge.mv2 --input ./large-dataset/ --vector-compression \
--parallel-segments \
--parallel-threads 8
# Fine-tune parallel settings
memvid put knowledge.mv2 --input ./corpus/ --vector-compression \
--parallel-segments \
--parallel-seg-tokens 4000 \
--parallel-threads 4 \
--parallel-queue-depth 16
| Option | Description | Default |
|---|
--parallel-segments | Enable multi-threaded processing | false |
--parallel-threads | Number of worker threads | CPU count - 1 |
--parallel-queue-depth | Queue size for workers | Auto |
--parallel-seg-tokens | Target tokens per segment | Auto |
Ingesting from Stdin
Useful for piping data from other commands:
# Pipe text content
echo "Important note to remember" | memvid put knowledge.mv2 --vector-compression
# Pipe from curl
curl -s https://api.example.com/data | memvid put knowledge.mv2 --vector-compression --title "API Response"
# Pipe from another command
cat log.txt | grep "ERROR" | memvid put knowledge.mv2 --vector-compression --track "errors"
Extract structured tables from PDFs (invoices, financial reports, pay stubs):
Basic Usage
# Extract tables from a PDF
memvid put knowledge.mv2 --input invoice.pdf --tables --vector-compression
# Extract tables and embed individual rows for semantic search
memvid put knowledge.mv2 --input financial-report.pdf --tables --embed-rows --vector-compression
Detection Methods
The table extractor uses multiple detection methods:
| Method | Best For |
|---|
| Stream | Tables without visible borders, text-based layouts |
| Lattice | Tables with visible grid lines and borders |
| LineBased | Columnar data with clear alignment patterns |
The extractor automatically tries each method and picks the best results.
After extraction, use the tables command to view and export:
# List all tables in a memory
memvid tables list knowledge.mv2
# Output:
# Found 3 tables:
# - pdf_table_1_page1: 5 rows x 4 cols (LineBased)
# - pdf_table_2_page1: 12 rows x 3 cols (Stream)
# - pdf_table_3_page2: 8 rows x 5 cols (Lattice)
# View a specific table
memvid tables view knowledge.mv2 --table-id pdf_table_1_page1
# Export to CSV
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format csv > data.csv
# Export to JSON
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format json
Example: Invoice Processing
# Create memory for invoices
memvid create invoices.mv2
# Ingest invoice with table extraction
memvid put invoices.mv2 --input amazon-invoice.pdf --tables --vector-compression
# Search for specific items
memvid find invoices.mv2 --query "total" --json
# List extracted tables
memvid tables list invoices.mv2
# Export line items to CSV
memvid tables export invoices.mv2 --table-id pdf_table_1_page1 --format csv
Updating Documents
The update command modifies an existing frame.
Synopsis
memvid update <FILE> [OPTIONS]
Options
| Option | Description |
|---|
--frame-id <ID> | Target frame by ID |
--uri <URI> | Target frame by URI |
--input <PATH> | New payload from file |
--set-uri <URI> | Update frame URI |
--title <TITLE> | Update title |
--timestamp <TS> | Update timestamp |
--track <TRACK> | Update track |
--kind <KIND> | Update kind |
--tag <KEY=VALUE> | Add/update tags |
--label <LABEL> | Add/update labels |
--metadata <JSON> | Add/update metadata |
--embeddings | Recompute embeddings |
--json | JSON output |
Examples
# Update title
memvid update project.mv2 --frame-id 1234 --title "Updated Title"
# Update content and recompute embeddings
memvid update project.mv2 --uri "file:///doc.txt" \
--input updated-doc.txt \
--embeddings
# Add new tags
memvid update project.mv2 --frame-id 1234 \
--tag "status=reviewed" \
--label approved
Response
Updated frame 1234 in project.mv2
Title: Updated Title
Tags added: status=reviewed
Labels added: approved
Embeddings: recomputed
Deleting Documents
The delete command removes a frame from the memory.
Synopsis
memvid delete <FILE> [OPTIONS]
Options
| Option | Description |
|---|
--frame-id <ID> | Target by frame ID |
--uri <URI> | Target by frame URI |
--yes | Skip confirmation prompt |
--json | JSON output |
Examples
# Delete by frame ID
memvid delete project.mv2 --frame-id 1234
# Delete by URI (skip confirmation)
memvid delete project.mv2 --uri "file:///old-doc.txt" --yes
Response
Deleted frame 1234 from project.mv2
URI: file:///old-doc.txt
Title: Old Document
Remote API Ingestion
The api-fetch command fetches remote content from APIs and ingests as frames.
Synopsis
memvid api-fetch <FILE> <CONFIG> [OPTIONS]
Options
| Option | Description |
|---|
--dry-run | Preview without writing |
--mode <MODE> | Override configured ingest mode |
--uri <URI> | Override base URI |
--json | JSON output |
{
"url": "https://api.example.com/documents",
"method": "GET",
"headers": {
"Authorization": "Bearer ${API_TOKEN}"
},
"pagination": {
"type": "cursor",
"cursor_param": "after",
"cursor_path": "$.meta.next_cursor"
},
"items_path": "$.data",
"mapping": {
"title": "$.name",
"text": "$.content",
"uri": "$.id"
}
}
Examples
# Fetch from API
memvid api-fetch project.mv2 ./fetch-config.json
# Dry run to preview
memvid api-fetch project.mv2 ./fetch-config.json --dry-run
Real-World Examples
Documentation Knowledge Base
# Create the memory
memvid create docs.mv2
# Ingest documentation with embeddings
memvid put docs.mv2 --input ./docs/ --vector-compression --track "documentation"
# Add API reference
memvid put docs.mv2 --input ./api-reference/ --vector-compression \
--track "api" \
--tag "type=reference"
Research Paper Archive
# Create the memory
memvid create papers.mv2
# Ingest papers with metadata
for paper in ./papers/*.pdf; do
memvid put papers.mv2 --input "$paper" --vector-compression \
--track "research" \
--tag "source=arxiv"
done
Code Repository
# Create memory for codebase
memvid create codebase.mv2
# Ingest with parallel processing
memvid put codebase.mv2 --input ./src/ --vector-compression \
--parallel-segments \
--track "source"
# Add tests and docs
memvid put codebase.mv2 --input ./tests/ --vector-compression --track "tests"
memvid put codebase.mv2 --input ./docs/ --vector-compression --track "docs"
Troubleshooting
File Locked
Error: File is locked by another process
Solutions:
# Check who holds the lock
memvid who knowledge.mv2
# Request release
memvid nudge knowledge.mv2
# Find process on macOS/Linux
lsof knowledge.mv2
# Wait longer for lock
memvid put knowledge.mv2 --input doc.pdf --lock-timeout 5000
# Force takeover (only if previous writer crashed)
memvid put knowledge.mv2 --input doc.pdf --force
Capacity Exceeded
Solutions:
# Check current usage
memvid stats knowledge.mv2
# Delete unused frames
memvid delete knowledge.mv2 --frame-id 42 --yes
# Compact the file
memvid doctor knowledge.mv2 --vacuum
Embedding Model Issues
Error: Failed to load embedding model
Solution:
# Set model directory
export MEMVID_MODELS_DIR=~/.memvid/models
# Or use offline mode with pre-cached models
export MEMVID_OFFLINE=1
Next Steps