Skip to main content

File Security

How are .mv2 files protected?

Integrity relies on cascading checksums:
  • Header checksum: Validates file header
  • TOC checksum: Validates table of contents
  • Per-segment checksums: Validates each data segment
  • Time index checksum: Validates timeline data
Confidentiality depends on OS file permissions. Memvid intentionally avoids bundling key management to keep the core simple.

Are checksums validated automatically?

Yes. When opening a file, Memvid validates:
  1. Header checksum
  2. TOC integrity
  3. WAL consistency
Deep verification (via memvid verify --deep) additionally checks all segment checksums.

What happens if a file is corrupted?

Memvid provides tools to detect and repair corruption:
# Detect issues
memvid verify knowledge.mv2 --deep

# Repair issues
memvid doctor knowledge.mv2 --rebuild-time-index --rebuild-lex-index
The embedded WAL protects against data loss from crashes or power failures.

Crash Safety

What ensures data survives crashes?

The embedded Write-Ahead Log (WAL):
  1. All mutations are written to WAL first
  2. WAL is synced to disk (fsync)
  3. Changes are then applied to main data
  4. On recovery, uncommitted WAL entries are replayed

How long does recovery take?

Recovery is fast:
  • Typical recovery: < 100ms
  • Large WAL replay (4MB): < 250ms

Are there any single points of failure?

No. The .mv2 file is self-contained:
  • No external databases
  • No network dependencies
  • No sidecar files that could be lost

Access Control

How does file locking work?

Memvid uses OS-level file locks:
  • Writers: Exclusive lock (one at a time)
  • Readers: Shared lock (multiple concurrent)
# Check who holds the lock
memvid who knowledge.mv2

# Request release
memvid nudge knowledge.mv2

Can multiple users access the same file?

Yes, but only one can write at a time:
# Reader (concurrent access OK)
mem = use('basic', 'knowledge.mv2', read_only=True)

# Writer (exclusive access)
mem = use('basic', 'knowledge.mv2')

Data Privacy

Is my data sent anywhere?

Local operations (search, timeline, stats) never send data anywhere. Ask operations with external LLMs (openai, claude, gemini) send context to those providers. To prevent this:
  1. Use the local model (tinyllama):
memvid ask knowledge.mv2 --question "What is X?"
  1. Use context-only mode:
memvid ask knowledge.mv2 --question "What is X?" --context-only
  1. Enable PII masking:
memvid ask knowledge.mv2 --question "Contact info?" --mask-pii --use-model openai

What does PII masking protect?

The --mask-pii flag masks sensitive information before sending to external LLMs:
PII TypeExampleMasked As
Email addresses[email protected][EMAIL]
Phone numbers555-123-4567[PHONE]
US Social Security Numbers123-45-6789[SSN]
Credit card numbers4111-1111-1111-1111[CREDIT_CARD]
IPv4 addresses192.168.1.1[IP_ADDRESS]
API keys/tokenssk-abc123...[API_KEY]

Using PII Masking

CLI:
memvid ask knowledge.mv2 --question "Contact info?" --mask-pii --use-model openai
Python SDK:
from memvid_sdk import use

mem = use('basic', 'knowledge.mv2')

# Enable PII masking for ask queries
answer = mem.ask(
    "What are the customer contact details?",
    model="openai:gpt-4o",
    mask_pii=True
)
print(answer['answer'])

# Standalone PII masking function
from memvid_sdk import mask_pii

text = "Contact [email protected] or call 555-123-4567"
masked = mask_pii(text)
# Output: "Contact [EMAIL] or call [PHONE]"
Node.js SDK:
import { use, maskPii } from '@memvid/sdk';

const mem = await use('basic', 'knowledge.mv2');

// Enable PII masking for ask queries
const answer = await mem.ask('What are the customer contact details?', {
  model: 'openai:gpt-4o',
  modelApiKey: process.env.OPENAI_API_KEY,
  maskPii: true
});
console.log(answer.answer);

// Standalone PII masking function
const text = 'Contact [email protected] or call 555-123-4567';
const masked = maskPii(text);
// Output: "Contact [EMAIL] or call [PHONE]"
PII masking is applied to the context sent to external LLMs, not to data stored in the memory file. The original data remains intact.

Verification

How do I verify file integrity?

# Basic verification
memvid verify knowledge.mv2

# Deep verification (all checksums)
memvid verify knowledge.mv2 --deep

# Single-file compliance (no sidecars)
memvid verify-single-file knowledge.mv2

What does deep verification check?

CheckDescription
HeaderChecksumHeader integrity
TocIntegrityTable of contents valid
WalConsistencyWAL state consistent
TimeIndexSortOrderTime index properly sorted
LexIndexDecodeLexical index readable
VecIndexDecodeVector index readable
FrameCountConsistencyFrame counts match

Best Practices

File Storage

  1. Use appropriate permissions: Restrict file access to authorized users
  2. Regular backups: Copy .mv2 files to backup storage
  3. Verify after transfer: Run memvid verify --deep after copying files

Production Use

  1. Read-only mode: Use for query-only workloads
  2. Monitor capacity: Check utilization before large ingestions
  3. Periodic verification: Run memvid verify --deep weekly

Sensitive Data

  1. PII masking: Always enable for external LLM calls
  2. Local models: Use tinyllama for sensitive queries
  3. Context-only mode: Get relevant docs without LLM synthesis