Skip to main content
Learn how to create new memory files and ingest documents using the Memvid CLI.

Creating a Memory File

Basic Usage

Create a new .mv2 memory file:
memvid create my-knowledge.mv2

Options

OptionDescriptionDefault
--tierCapacity tier (free, dev, enterprise)free
--sizeCapacity override (e.g. 15MB, capped at 50MB)50MB
--no-lexDisable lexical/full-text indexEnabled
--no-vectorDisable vector indexEnabled

Examples

# Create a basic memory file
memvid create research.mv2

# Create without lexical index
memvid create notes.mv2 --no-lex

# Create a smaller memory (capacity override)
memvid create small.mv2 --size 512MB
memvid create is capped at 50MB. To go beyond 50MB, create the file and then apply a signed capacity ticket (see memvid tickets sync/apply).

JSON Output

memvid create my-memory.mv2 --json
{
  "path": "my-memory.mv2",
  "size_limit_bytes": 536870912,
  "lex_enabled": true,
  "vec_enabled": true,
  "created_at": "2024-01-15T10:30:00Z"
}

Inspecting a Memory File

The open command shows metadata and manifests of an existing memory file.

Synopsis

memvid open <FILE> [OPTIONS]

Options

OptionDescription
--jsonEmit JSON output

Examples

# Inspect a memory file
memvid open my-memory.mv2

# Get JSON output for scripting
memvid open my-memory.mv2 --json

Response

Memory File: my-memory.mv2
  Version: 2.1.0
  Created: 2024-01-15T10:30:00Z
  Frames: 1,234
  Size: 45.2 MB / 512 MB (8.8%)

Indexes:
  Lexical: enabled (12,456 terms)
  Vector: enabled (1,234 vectors, 384d)
  Time: enabled (1,234 entries)

Tracks:
  default: 890 frames
  meetings: 234 frames
  emails: 110 frames

Memory Binding:
  Memory ID: mem_abc123
  Bound at: 2024-01-15T10:30:00Z

JSON Output

{
  "path": "my-memory.mv2",
  "version": "2.1.0",
  "created_at": "2024-01-15T10:30:00Z",
  "frame_count": 1234,
  "size_bytes": 47395430,
  "size_limit_bytes": 536870912,
  "indexes": {
    "lex": { "enabled": true, "term_count": 12456 },
    "vec": { "enabled": true, "vector_count": 1234, "dimension": 384 },
    "time": { "enabled": true, "entry_count": 1234 }
  },
  "tracks": {
    "default": 890,
    "meetings": 234,
    "emails": 110
  },
  "binding": {
    "memory_id": "mem_abc123",
    "bound_at": "2024-01-15T10:30:00Z"
  }
}

Ingesting Documents

The put command adds documents to your memory file as frames.

Basic Usage

# Ingest a single file (text-only)
memvid put my-knowledge.mv2 --input document.pdf

# Ingest a directory
memvid put my-knowledge.mv2 --input ./documents/

# Ingest with semantic embeddings (+16x PQ compression)
memvid put my-knowledge.mv2 --input document.pdf --embedding --vector-compression

# Ingest from stdin (text-only by default)
echo "Some text content" | memvid put my-knowledge.mv2

Core Options

OptionDescription
--input PATHPath to file or directory
--uri URICustom URI for the frame
--title TITLEDocument title
--timestamp UNIX_TSPOSIX timestamp
--track TRACKTrack/collection name
--kind KINDContent type metadata
--jsonOutput as JSON

Metadata Options

OptionDescription
--tag KEY=VALUEAdd tags (repeatable)
--label LABELAdd labels (repeatable)
--metadata JSONAdditional metadata as JSON
--no-auto-tagDisable automatic tag extraction
--no-extract-datesDisable date extraction

CLIP & Entity Extraction (Auto-Enabled)

When the CLIP and NER models are installed, the CLI automatically enables visual embeddings for images/PDFs and entity extraction.
OptionDescription
--clipExplicitly enable CLIP visual embeddings
--no-clipDisable CLIP even when model is available
--logic-meshExplicitly enable entity extraction
--no-logic-meshDisable entity extraction even when model is available
Install models manually:
memvid models install --clip mobileclip-s2
memvid models install --ner distilbert-ner

Embedding Options

OptionDescription
--embeddingEnable semantic embeddings
-m, --embedding-model MODELChoose default embedding model (global flag; see below)
--vector-compressionGenerate semantic embeddings with 16x compression
--no-embeddingExplicitly disable embeddings
Embedding Model Options:
ModelDescription
bge-smallLocal fastembed default (384d)
bge-baseLocal higher quality (768d)
nomicLocal high accuracy (768d)
gte-largeLocal best semantic depth (1024d)
openai-smallOpenAI text-embedding-3-small (1536d)
openai-largeOpenAI text-embedding-3-large (3072d)
openaiAlias for openai-large
openai-adaOpenAI text-embedding-ada-002 (1536d, legacy)
# Use built-in BGE (default, no API key needed)
memvid put knowledge.mv2 --input docs/ --embedding

# Use OpenAI embeddings
export OPENAI_API_KEY=sk-...
memvid put knowledge.mv2 --input docs/ --embedding -m openai-small

# Use OpenAI large model for higher quality
memvid put knowledge.mv2 --input docs/ --embedding -m openai-large

Table Extraction Options

OptionDescription
--tablesExtract tables from PDF files
--embed-rowsEmbed individual table rows for semantic search (default: true)

Duplicate Handling

OptionDescription
--update-existingReplace existing frame with same URI
--allow-duplicateAllow multiple frames with same URI

Lock Control

OptionDescriptionDefault
--lock-timeout MSWait time for lock250ms
--forceForce takeover of stale lockfalse

Ingesting Different File Types

Memvid automatically detects and processes various file formats:
# Plain text
memvid put knowledge.mv2 --input notes.txt --vector-compression

# Markdown
memvid put knowledge.mv2 --input README.md --vector-compression

# HTML
memvid put knowledge.mv2 --input page.html --vector-compression

Adding Metadata

Organize your documents with tracks, tags, and timestamps:
# Add to a specific track
memvid put knowledge.mv2 --input meeting-notes.md --vector-compression --track "meetings"

# Add metadata tags
memvid put knowledge.mv2 --input api-docs.md --vector-compression \
  --tag "category=documentation" \
  --tag "version=2.0" \
  --tag "author=team"

# Add labels
memvid put knowledge.mv2 --input report.pdf --vector-compression \
  --label "quarterly" \
  --label "finance"

# Set custom timestamp
memvid put knowledge.mv2 --input old-report.pdf --vector-compression \
  --timestamp 1686819000

# Combine options
memvid put knowledge.mv2 --input quarterly-report.pdf --vector-compression \
  --track "reports" \
  --title "Q3 2024 Report" \
  --tag "quarter=Q3" \
  --tag "year=2024"

Parallel Ingestion

For large datasets, enable multi-threaded processing:
# Enable parallel ingestion
memvid put knowledge.mv2 --input ./large-dataset/ --vector-compression \
  --parallel-segments \
  --parallel-threads 8

# Fine-tune parallel settings
memvid put knowledge.mv2 --input ./corpus/ --vector-compression \
  --parallel-segments \
  --parallel-seg-tokens 4000 \
  --parallel-threads 4 \
  --parallel-queue-depth 16
OptionDescriptionDefault
--parallel-segmentsEnable multi-threaded processingfalse
--parallel-threadsNumber of worker threadsCPU count - 1
--parallel-queue-depthQueue size for workersAuto
--parallel-seg-tokensTarget tokens per segmentAuto

Ingesting from Stdin

Useful for piping data from other commands:
# Pipe text content
echo "Important note to remember" | memvid put knowledge.mv2 --vector-compression

# Pipe from curl
curl -s https://api.example.com/data | memvid put knowledge.mv2 --vector-compression --title "API Response"

# Pipe from another command
cat log.txt | grep "ERROR" | memvid put knowledge.mv2 --vector-compression --track "errors"

PDF Table Extraction

Extract structured tables from PDFs (invoices, financial reports, pay stubs):

Basic Usage

# Extract tables from a PDF
memvid put knowledge.mv2 --input invoice.pdf --tables --vector-compression

# Extract tables and embed individual rows for semantic search
memvid put knowledge.mv2 --input financial-report.pdf --tables --embed-rows --vector-compression

Detection Methods

The table extractor uses multiple detection methods:
MethodBest For
StreamTables without visible borders, text-based layouts
LatticeTables with visible grid lines and borders
LineBasedColumnar data with clear alignment patterns
The extractor automatically tries each method and picks the best results.

Viewing Extracted Tables

After extraction, use the tables command to view and export:
# List all tables in a memory
memvid tables list knowledge.mv2

# Output:
# Found 3 tables:
#   - pdf_table_1_page1: 5 rows x 4 cols (LineBased)
#   - pdf_table_2_page1: 12 rows x 3 cols (Stream)
#   - pdf_table_3_page2: 8 rows x 5 cols (Lattice)

# View a specific table
memvid tables view knowledge.mv2 --table-id pdf_table_1_page1

# Export to CSV
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format csv > data.csv

# Export to JSON
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format json

Example: Invoice Processing

# Create memory for invoices
memvid create invoices.mv2

# Ingest invoice with table extraction
memvid put invoices.mv2 --input amazon-invoice.pdf --tables --vector-compression

# Search for specific items
memvid find invoices.mv2 --query "total" --json

# List extracted tables
memvid tables list invoices.mv2

# Export line items to CSV
memvid tables export invoices.mv2 --table-id pdf_table_1_page1 --format csv

Updating Documents

The update command modifies an existing frame.

Synopsis

memvid update <FILE> [OPTIONS]

Options

OptionDescription
--frame-id <ID>Target frame by ID
--uri <URI>Target frame by URI
--input <PATH>New payload from file
--set-uri <URI>Update frame URI
--title <TITLE>Update title
--timestamp <TS>Update timestamp
--track <TRACK>Update track
--kind <KIND>Update kind
--tag <KEY=VALUE>Add/update tags
--label <LABEL>Add/update labels
--metadata <JSON>Add/update metadata
--embeddingsRecompute embeddings
--jsonJSON output

Examples

# Update title
memvid update project.mv2 --frame-id 1234 --title "Updated Title"

# Update content and recompute embeddings
memvid update project.mv2 --uri "file:///doc.txt" \
  --input updated-doc.txt \
  --embeddings

# Add new tags
memvid update project.mv2 --frame-id 1234 \
  --tag "status=reviewed" \
  --label approved

Response

Updated frame 1234 in project.mv2
  Title: Updated Title
  Tags added: status=reviewed
  Labels added: approved
  Embeddings: recomputed

Deleting Documents

The delete command removes a frame from the memory.

Synopsis

memvid delete <FILE> [OPTIONS]

Options

OptionDescription
--frame-id <ID>Target by frame ID
--uri <URI>Target by frame URI
--yesSkip confirmation prompt
--jsonJSON output

Examples

# Delete by frame ID
memvid delete project.mv2 --frame-id 1234

# Delete by URI (skip confirmation)
memvid delete project.mv2 --uri "file:///old-doc.txt" --yes

Response

Deleted frame 1234 from project.mv2
  URI: file:///old-doc.txt
  Title: Old Document

Remote API Ingestion

The api-fetch command fetches remote content from APIs and ingests as frames.

Synopsis

memvid api-fetch <FILE> <CONFIG> [OPTIONS]

Options

OptionDescription
--dry-runPreview without writing
--mode <MODE>Override configured ingest mode
--uri <URI>Override base URI
--jsonJSON output

Config File Format

{
  "url": "https://api.example.com/documents",
  "method": "GET",
  "headers": {
    "Authorization": "Bearer ${API_TOKEN}"
  },
  "pagination": {
    "type": "cursor",
    "cursor_param": "after",
    "cursor_path": "$.meta.next_cursor"
  },
  "items_path": "$.data",
  "mapping": {
    "title": "$.name",
    "text": "$.content",
    "uri": "$.id"
  }
}

Examples

# Fetch from API
memvid api-fetch project.mv2 ./fetch-config.json

# Dry run to preview
memvid api-fetch project.mv2 ./fetch-config.json --dry-run

Real-World Examples

Documentation Knowledge Base

# Create the memory
memvid create docs.mv2

# Ingest documentation with embeddings
memvid put docs.mv2 --input ./docs/ --vector-compression --track "documentation"

# Add API reference
memvid put docs.mv2 --input ./api-reference/ --vector-compression \
  --track "api" \
  --tag "type=reference"

Research Paper Archive

# Create the memory
memvid create papers.mv2

# Ingest papers with metadata
for paper in ./papers/*.pdf; do
  memvid put papers.mv2 --input "$paper" --vector-compression \
    --track "research" \
    --tag "source=arxiv"
done

Code Repository

# Create memory for codebase
memvid create codebase.mv2

# Ingest with parallel processing
memvid put codebase.mv2 --input ./src/ --vector-compression \
  --parallel-segments \
  --track "source"

# Add tests and docs
memvid put codebase.mv2 --input ./tests/ --vector-compression --track "tests"
memvid put codebase.mv2 --input ./docs/ --vector-compression --track "docs"

Troubleshooting

File Locked

Error: File is locked by another process
Solutions:
# Check who holds the lock
memvid who knowledge.mv2

# Request release
memvid nudge knowledge.mv2

# Find process on macOS/Linux
lsof knowledge.mv2

# Wait longer for lock
memvid put knowledge.mv2 --input doc.pdf --lock-timeout 5000

# Force takeover (only if previous writer crashed)
memvid put knowledge.mv2 --input doc.pdf --force

Capacity Exceeded

Error: CapacityExceeded
Solutions:
# Check current usage
memvid stats knowledge.mv2

# Delete unused frames
memvid delete knowledge.mv2 --frame-id 42 --yes

# Compact the file
memvid doctor knowledge.mv2 --vacuum

Embedding Model Issues

Error: Failed to load embedding model
Solution:
# Set model directory
export MEMVID_MODELS_DIR=~/.memvid/models

# Or use offline mode with pre-cached models
export MEMVID_OFFLINE=1

Next Steps