Create & Ingest

Learn how to create new memory files and ingest documents using the Memvid CLI.

Creating a Memory File

Basic Usage

Create a new .mv2 memory file:

memvid create my-knowledge.mv2

Options

Option	Description	Default
`--tier`	Capacity tier (`free`, `dev`, `enterprise`)	`free`
`--size`	Capacity override (e.g. `15MB`, capped at `50MB`)	`50MB`
`--no-lex`	Disable lexical/full-text index	Enabled
`--no-vector`	Disable vector index	Enabled

Examples

# Create a basic memory file
memvid create research.mv2

# Create without lexical index
memvid create notes.mv2 --no-lex

# Create a smaller memory (capacity override)
memvid create small.mv2 --size 512MB

memvid create is capped at 50MB. To go beyond 50MB, create the file and then apply a signed capacity ticket (see memvid tickets sync/apply).

JSON Output

memvid create my-memory.mv2 --json

{
  "path": "my-memory.mv2",
  "size_limit_bytes": 536870912,
  "lex_enabled": true,
  "vec_enabled": true,
  "created_at": "2024-01-15T10:30:00Z"
}

Inspecting a Memory File

The open command shows metadata and manifests of an existing memory file.

Synopsis

memvid open <FILE> [OPTIONS]

Options

Option	Description
`--json`	Emit JSON output

Examples

# Inspect a memory file
memvid open my-memory.mv2

# Get JSON output for scripting
memvid open my-memory.mv2 --json

Response

Memory File: my-memory.mv2
  Version: 2.1.0
  Created: 2024-01-15T10:30:00Z
  Frames: 1,234
  Size: 45.2 MB / 512 MB (8.8%)

Indexes:
  Lexical: enabled (12,456 terms)
  Vector: enabled (1,234 vectors, 384d)
  Time: enabled (1,234 entries)

Tracks:
  default: 890 frames
  meetings: 234 frames
  emails: 110 frames

Memory Binding:
  Memory ID: mem_abc123
  Bound at: 2024-01-15T10:30:00Z

JSON Output

{
  "path": "my-memory.mv2",
  "version": "2.1.0",
  "created_at": "2024-01-15T10:30:00Z",
  "frame_count": 1234,
  "size_bytes": 47395430,
  "size_limit_bytes": 536870912,
  "indexes": {
    "lex": { "enabled": true, "term_count": 12456 },
    "vec": { "enabled": true, "vector_count": 1234, "dimension": 384 },
    "time": { "enabled": true, "entry_count": 1234 }
  },
  "tracks": {
    "default": 890,
    "meetings": 234,
    "emails": 110
  },
  "binding": {
    "memory_id": "mem_abc123",
    "bound_at": "2024-01-15T10:30:00Z"
  }
}

Ingesting Documents

The put command adds documents to your memory file as frames.

Basic Usage

# Ingest a single file (text-only)
memvid put my-knowledge.mv2 --input document.pdf

# Ingest a directory
memvid put my-knowledge.mv2 --input ./documents/

# Ingest with semantic embeddings (+16x PQ compression)
memvid put my-knowledge.mv2 --input document.pdf --embedding --vector-compression

# Ingest from stdin (text-only by default)
echo "Some text content" | memvid put my-knowledge.mv2

Core Options

Option	Description
`--input PATH`	Path to file or directory
`--uri URI`	Custom URI for the frame
`--title TITLE`	Document title
`--timestamp UNIX_TS`	POSIX timestamp
`--track TRACK`	Track/collection name
`--kind KIND`	Content type metadata
`--json`	Output as JSON

Metadata Options

Option	Description
`--tag KEY=VALUE`	Add tags (repeatable)
`--label LABEL`	Add labels (repeatable)
`--metadata JSON`	Additional metadata as JSON
`--no-auto-tag`	Disable automatic tag extraction
`--no-extract-dates`	Disable date extraction

CLIP & Entity Extraction (Auto-Enabled)

When the CLIP and NER models are installed, the CLI automatically enables visual embeddings for images/PDFs and entity extraction.

Option	Description
`--clip`	Explicitly enable CLIP visual embeddings
`--no-clip`	Disable CLIP even when model is available
`--logic-mesh`	Explicitly enable entity extraction
`--no-logic-mesh`	Disable entity extraction even when model is available

Install models manually:

memvid models install --clip mobileclip-s2
memvid models install --ner distilbert-ner

Embedding Options

Option	Description
`--embedding`	Enable semantic embeddings
`-m, --embedding-model MODEL`	Choose default embedding model (global flag; see below)
`--vector-compression`	Generate semantic embeddings with 16x compression
`--no-embedding`	Explicitly disable embeddings

Embedding Model Options:

Model	Description
`bge-small`	Local fastembed default (384d)
`bge-base`	Local higher quality (768d)
`nomic`	Local high accuracy (768d)
`gte-large`	Local best semantic depth (1024d)
`openai-small`	OpenAI text-embedding-3-small (1536d)
`openai-large`	OpenAI text-embedding-3-large (3072d)
`openai`	Alias for `openai-large`
`openai-ada`	OpenAI text-embedding-ada-002 (1536d, legacy)

# Use built-in BGE (default, no API key needed)
memvid put knowledge.mv2 --input docs/ --embedding

# Use OpenAI embeddings
export OPENAI_API_KEY=sk-...
memvid put knowledge.mv2 --input docs/ --embedding -m openai-small

# Use OpenAI large model for higher quality
memvid put knowledge.mv2 --input docs/ --embedding -m openai-large

Table Extraction Options

Option	Description
`--tables`	Extract tables from PDF files
`--embed-rows`	Embed individual table rows for semantic search (default: true)

Duplicate Handling

Option	Description
`--update-existing`	Replace existing frame with same URI
`--allow-duplicate`	Allow multiple frames with same URI

Lock Control

Option	Description	Default
`--lock-timeout MS`	Wait time for lock	250ms
`--force`	Force takeover of stale lock	false

Ingesting Different File Types

Memvid automatically detects and processes various file formats:

Text Files
Documents
Media

# Plain text
memvid put knowledge.mv2 --input notes.txt --vector-compression

# Markdown
memvid put knowledge.mv2 --input README.md --vector-compression

# HTML
memvid put knowledge.mv2 --input page.html --vector-compression

# PDF files
memvid put knowledge.mv2 --input report.pdf --vector-compression

# PDF with table extraction
memvid put knowledge.mv2 --input invoice.pdf --tables --vector-compression

# Word documents
memvid put knowledge.mv2 --input document.docx --vector-compression

# Excel spreadsheets
memvid put knowledge.mv2 --input data.xlsx --vector-compression

# PowerPoint presentations
memvid put knowledge.mv2 --input slides.pptx --vector-compression

# Images with EXIF extraction
memvid put knowledge.mv2 --input photo.jpg

# Audio files
memvid put knowledge.mv2 --input recording.mp3 --audio

# Video files (stored without transcoding)
memvid put knowledge.mv2 --input video.mp4 --video

Adding Metadata

Organize your documents with tracks, tags, and timestamps:

# Add to a specific track
memvid put knowledge.mv2 --input meeting-notes.md --vector-compression --track "meetings"

# Add metadata tags
memvid put knowledge.mv2 --input api-docs.md --vector-compression \
  --tag "category=documentation" \
  --tag "version=2.0" \
  --tag "author=team"

# Add labels
memvid put knowledge.mv2 --input report.pdf --vector-compression \
  --label "quarterly" \
  --label "finance"

# Set custom timestamp
memvid put knowledge.mv2 --input old-report.pdf --vector-compression \
  --timestamp 1686819000

# Combine options
memvid put knowledge.mv2 --input quarterly-report.pdf --vector-compression \
  --track "reports" \
  --title "Q3 2024 Report" \
  --tag "quarter=Q3" \
  --tag "year=2024"

Parallel Ingestion

For large datasets, enable multi-threaded processing:

# Enable parallel ingestion
memvid put knowledge.mv2 --input ./large-dataset/ --vector-compression \
  --parallel-segments \
  --parallel-threads 8

# Fine-tune parallel settings
memvid put knowledge.mv2 --input ./corpus/ --vector-compression \
  --parallel-segments \
  --parallel-seg-tokens 4000 \
  --parallel-threads 4 \
  --parallel-queue-depth 16

Option	Description	Default
`--parallel-segments`	Enable multi-threaded processing	false
`--parallel-threads`	Number of worker threads	CPU count - 1
`--parallel-queue-depth`	Queue size for workers	Auto
`--parallel-seg-tokens`	Target tokens per segment	Auto

Ingesting from Stdin

Useful for piping data from other commands:

# Pipe text content
echo "Important note to remember" | memvid put knowledge.mv2 --vector-compression

# Pipe from curl
curl -s https://api.example.com/data | memvid put knowledge.mv2 --vector-compression --title "API Response"

# Pipe from another command
cat log.txt | grep "ERROR" | memvid put knowledge.mv2 --vector-compression --track "errors"

PDF Table Extraction

Extract structured tables from PDFs (invoices, financial reports, pay stubs):

Basic Usage

# Extract tables from a PDF
memvid put knowledge.mv2 --input invoice.pdf --tables --vector-compression

# Extract tables and embed individual rows for semantic search
memvid put knowledge.mv2 --input financial-report.pdf --tables --embed-rows --vector-compression

Detection Methods

The table extractor uses multiple detection methods:

Method	Best For
Stream	Tables without visible borders, text-based layouts
Lattice	Tables with visible grid lines and borders
LineBased	Columnar data with clear alignment patterns

The extractor automatically tries each method and picks the best results.

Viewing Extracted Tables

After extraction, use the tables command to view and export:

# List all tables in a memory
memvid tables list knowledge.mv2

# Output:
# Found 3 tables:
#   - pdf_table_1_page1: 5 rows x 4 cols (LineBased)
#   - pdf_table_2_page1: 12 rows x 3 cols (Stream)
#   - pdf_table_3_page2: 8 rows x 5 cols (Lattice)

# View a specific table
memvid tables view knowledge.mv2 --table-id pdf_table_1_page1

# Export to CSV
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format csv > data.csv

# Export to JSON
memvid tables export knowledge.mv2 --table-id pdf_table_1_page1 --format json

Example: Invoice Processing

# Create memory for invoices
memvid create invoices.mv2

# Ingest invoice with table extraction
memvid put invoices.mv2 --input amazon-invoice.pdf --tables --vector-compression

# Search for specific items
memvid find invoices.mv2 --query "total" --json

# List extracted tables
memvid tables list invoices.mv2

# Export line items to CSV
memvid tables export invoices.mv2 --table-id pdf_table_1_page1 --format csv

Updating Documents

The update command modifies an existing frame.

Synopsis

memvid update <FILE> [OPTIONS]

Options

Option	Description
`--frame-id <ID>`	Target frame by ID
`--uri <URI>`	Target frame by URI
`--input <PATH>`	New payload from file
`--set-uri <URI>`	Update frame URI
`--title <TITLE>`	Update title
`--timestamp <TS>`	Update timestamp
`--track <TRACK>`	Update track
`--kind <KIND>`	Update kind
`--tag <KEY=VALUE>`	Add/update tags
`--label <LABEL>`	Add/update labels
`--metadata <JSON>`	Add/update metadata
`--embeddings`	Recompute embeddings
`--json`	JSON output

Examples

# Update title
memvid update project.mv2 --frame-id 1234 --title "Updated Title"

# Update content and recompute embeddings
memvid update project.mv2 --uri "file:///doc.txt" \
  --input updated-doc.txt \
  --embeddings

# Add new tags
memvid update project.mv2 --frame-id 1234 \
  --tag "status=reviewed" \
  --label approved

Response

Updated frame 1234 in project.mv2
  Title: Updated Title
  Tags added: status=reviewed
  Labels added: approved
  Embeddings: recomputed

Deleting Documents

The delete command removes a frame from the memory.

Synopsis

memvid delete <FILE> [OPTIONS]

Options

Option	Description
`--frame-id <ID>`	Target by frame ID
`--uri <URI>`	Target by frame URI
`--yes`	Skip confirmation prompt
`--json`	JSON output

Examples

# Delete by frame ID
memvid delete project.mv2 --frame-id 1234

# Delete by URI (skip confirmation)
memvid delete project.mv2 --uri "file:///old-doc.txt" --yes

Response

Deleted frame 1234 from project.mv2
  URI: file:///old-doc.txt
  Title: Old Document

Remote API Ingestion

The api-fetch command fetches remote content from APIs and ingests as frames.

Synopsis

memvid api-fetch <FILE> <CONFIG> [OPTIONS]

Options

Option	Description
`--dry-run`	Preview without writing
`--mode <MODE>`	Override configured ingest mode
`--uri <URI>`	Override base URI
`--json`	JSON output

Config File Format

{
  "url": "https://api.example.com/documents",
  "method": "GET",
  "headers": {
    "Authorization": "Bearer ${API_TOKEN}"
  },
  "pagination": {
    "type": "cursor",
    "cursor_param": "after",
    "cursor_path": "$.meta.next_cursor"
  },
  "items_path": "$.data",
  "mapping": {
    "title": "$.name",
    "text": "$.content",
    "uri": "$.id"
  }
}

Examples

# Fetch from API
memvid api-fetch project.mv2 ./fetch-config.json

# Dry run to preview
memvid api-fetch project.mv2 ./fetch-config.json --dry-run

Real-World Examples

Documentation Knowledge Base

# Create the memory
memvid create docs.mv2

# Ingest documentation with embeddings
memvid put docs.mv2 --input ./docs/ --vector-compression --track "documentation"

# Add API reference
memvid put docs.mv2 --input ./api-reference/ --vector-compression \
  --track "api" \
  --tag "type=reference"

Research Paper Archive

# Create the memory
memvid create papers.mv2

# Ingest papers with metadata
for paper in ./papers/*.pdf; do
  memvid put papers.mv2 --input "$paper" --vector-compression \
    --track "research" \
    --tag "source=arxiv"
done

Code Repository

# Create memory for codebase
memvid create codebase.mv2

# Ingest with parallel processing
memvid put codebase.mv2 --input ./src/ --vector-compression \
  --parallel-segments \
  --track "source"

# Add tests and docs
memvid put codebase.mv2 --input ./tests/ --vector-compression --track "tests"
memvid put codebase.mv2 --input ./docs/ --vector-compression --track "docs"

Troubleshooting

File Locked

Error: File is locked by another process

Solutions:

# Check who holds the lock
memvid who knowledge.mv2

# Request release
memvid nudge knowledge.mv2

# Find process on macOS/Linux
lsof knowledge.mv2

# Wait longer for lock
memvid put knowledge.mv2 --input doc.pdf --lock-timeout 5000

# Force takeover (only if previous writer crashed)
memvid put knowledge.mv2 --input doc.pdf --force

Capacity Exceeded

Error: CapacityExceeded

Solutions:

# Check current usage
memvid stats knowledge.mv2

# Delete unused frames
memvid delete knowledge.mv2 --frame-id 42 --yes

# Compact the file
memvid doctor knowledge.mv2 --vacuum

Embedding Model Issues

Error: Failed to load embedding model

Solution:

# Set model directory
export MEMVID_MODELS_DIR=~/.memvid/models

# Or use offline mode with pre-cached models
export MEMVID_OFFLINE=1

​Creating a Memory File

​Basic Usage

​Options

​Examples

​JSON Output

​Inspecting a Memory File

​Synopsis

​Options

​Examples

​Response

​JSON Output

​Ingesting Documents

​Basic Usage

​Core Options

​Metadata Options

​CLIP & Entity Extraction (Auto-Enabled)

​Embedding Options

​Table Extraction Options

​Duplicate Handling

​Lock Control

​Ingesting Different File Types

​Adding Metadata

​Parallel Ingestion

​Ingesting from Stdin

​PDF Table Extraction

​Basic Usage

​Detection Methods

​Viewing Extracted Tables

​Example: Invoice Processing

​Updating Documents

​Synopsis

​Options

​Examples

​Response

​Deleting Documents

​Synopsis

​Options

​Examples

​Response

​Remote API Ingestion

​Synopsis

​Options

​Config File Format

​Examples

​Real-World Examples

​Documentation Knowledge Base

​Research Paper Archive

​Code Repository

​Troubleshooting

​File Locked

​Capacity Exceeded

​Embedding Model Issues

​Next Steps

Search & Ask

Timeline & View

Creating a Memory File

Basic Usage

Options

Examples

JSON Output

Inspecting a Memory File

Synopsis

Options

Examples

Response

JSON Output

Ingesting Documents

Basic Usage

Core Options

Metadata Options

CLIP & Entity Extraction (Auto-Enabled)

Embedding Options

Table Extraction Options

Duplicate Handling

Lock Control

Ingesting Different File Types

Adding Metadata

Parallel Ingestion

Ingesting from Stdin

PDF Table Extraction

Basic Usage

Detection Methods

Viewing Extracted Tables

Example: Invoice Processing

Updating Documents

Synopsis

Options

Examples

Response

Deleting Documents

Synopsis

Options

Examples

Response

Remote API Ingestion

Synopsis

Options

Config File Format

Examples

Real-World Examples

Documentation Knowledge Base

Research Paper Archive

Code Repository

Troubleshooting

File Locked

Capacity Exceeded

Embedding Model Issues

Next Steps