File Security
How are .mv2 files protected?
Integrity relies on cascading checksums:
- Header checksum: Validates file header
- TOC checksum: Validates table of contents
- Per-segment checksums: Validates each data segment
- Time index checksum: Validates timeline data
Confidentiality depends on OS file permissions. Memvid intentionally avoids bundling key management to keep the core simple.
Are checksums validated automatically?
Yes. When opening a file, Memvid validates:
- Header checksum
- TOC integrity
- WAL consistency
Deep verification (via memvid verify --deep) additionally checks all segment checksums.
What happens if a file is corrupted?
Memvid provides tools to detect and repair corruption:
# Detect issues
memvid verify knowledge.mv2 --deep
# Repair issues
memvid doctor knowledge.mv2 --rebuild-time-index --rebuild-lex-index
The embedded WAL protects against data loss from crashes or power failures.
Crash Safety
What ensures data survives crashes?
The embedded Write-Ahead Log (WAL):
- All mutations are written to WAL first
- WAL is synced to disk (fsync)
- Changes are then applied to main data
- On recovery, uncommitted WAL entries are replayed
How long does recovery take?
Recovery is fast:
- Typical recovery: < 100ms
- Large WAL replay (4MB): < 250ms
Are there any single points of failure?
No. The .mv2 file is self-contained:
- No external databases
- No network dependencies
- No sidecar files that could be lost
Access Control
How does file locking work?
Memvid uses OS-level file locks:
- Writers: Exclusive lock (one at a time)
- Readers: Shared lock (multiple concurrent)
# Check who holds the lock
memvid who knowledge.mv2
# Request release
memvid nudge knowledge.mv2
Can multiple users access the same file?
Yes, but only one can write at a time:
# Reader (concurrent access OK)
mem = use('basic', 'knowledge.mv2', read_only=True)
# Writer (exclusive access)
mem = use('basic', 'knowledge.mv2')
Data Privacy
Is my data sent anywhere?
Local operations (search, timeline, stats) never send data anywhere.
Ask operations with external LLMs (openai, claude, gemini) send context to those providers. To prevent this:
- Use the local model (tinyllama):
memvid ask knowledge.mv2 --question "What is X?"
- Use context-only mode:
memvid ask knowledge.mv2 --question "What is X?" --context-only
- Enable PII masking:
memvid ask knowledge.mv2 --question "Contact info?" --mask-pii --use-model openai
What does PII masking protect?
The --mask-pii flag masks sensitive information before sending to external LLMs:
| PII Type | Example | Masked As |
|---|
| Email addresses | [email protected] | [EMAIL] |
| Phone numbers | 555-123-4567 | [PHONE] |
| US Social Security Numbers | 123-45-6789 | [SSN] |
| Credit card numbers | 4111-1111-1111-1111 | [CREDIT_CARD] |
| IPv4 addresses | 192.168.1.1 | [IP_ADDRESS] |
| API keys/tokens | sk-abc123... | [API_KEY] |
Using PII Masking
CLI:
memvid ask knowledge.mv2 --question "Contact info?" --mask-pii --use-model openai
Python SDK:
from memvid_sdk import use
mem = use('basic', 'knowledge.mv2')
# Enable PII masking for ask queries
answer = mem.ask(
"What are the customer contact details?",
model="openai:gpt-4o",
mask_pii=True
)
print(answer['answer'])
# Standalone PII masking function
from memvid_sdk import mask_pii
text = "Contact [email protected] or call 555-123-4567"
masked = mask_pii(text)
# Output: "Contact [EMAIL] or call [PHONE]"
Node.js SDK:
import { use, maskPii } from '@memvid/sdk';
const mem = await use('basic', 'knowledge.mv2');
// Enable PII masking for ask queries
const answer = await mem.ask('What are the customer contact details?', {
model: 'openai:gpt-4o',
modelApiKey: process.env.OPENAI_API_KEY,
maskPii: true
});
console.log(answer.answer);
// Standalone PII masking function
const text = 'Contact [email protected] or call 555-123-4567';
const masked = maskPii(text);
// Output: "Contact [EMAIL] or call [PHONE]"
PII masking is applied to the context sent to external LLMs, not to data stored in the memory file. The original data remains intact.
Verification
How do I verify file integrity?
# Basic verification
memvid verify knowledge.mv2
# Deep verification (all checksums)
memvid verify knowledge.mv2 --deep
# Single-file compliance (no sidecars)
memvid verify-single-file knowledge.mv2
What does deep verification check?
| Check | Description |
|---|
HeaderChecksum | Header integrity |
TocIntegrity | Table of contents valid |
WalConsistency | WAL state consistent |
TimeIndexSortOrder | Time index properly sorted |
LexIndexDecode | Lexical index readable |
VecIndexDecode | Vector index readable |
FrameCountConsistency | Frame counts match |
Best Practices
File Storage
- Use appropriate permissions: Restrict file access to authorized users
- Regular backups: Copy
.mv2 files to backup storage
- Verify after transfer: Run
memvid verify --deep after copying files
Production Use
- Read-only mode: Use for query-only workloads
- Monitor capacity: Check utilization before large ingestions
- Periodic verification: Run
memvid verify --deep weekly
Sensitive Data
- PII masking: Always enable for external LLM calls
- Local models: Use tinyllama for sensitive queries
- Context-only mode: Get relevant docs without LLM synthesis