Documentation Index Fetch the complete documentation index at: https://docs.memvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Memvid can detect and mask Personally Identifiable Information (PII) in search results and ask responses. The original data remains searchable, but sensitive information is redacted in output.
How It Works
Example:
Original: "Contact john@example.com or call 555-123-4567"
Masked: "Contact [EMAIL] or call [PHONE]"
Key points:
Original data preserved : Content stored without modification
Searchable : You can search for emails, phones, etc.
Masked on output : PII hidden in results and responses
Query-time detection : No preprocessing required
Detected PII Types
Type Pattern Masked As Email addresses user@domain.com[EMAIL]Phone numbers 555-123-4567, (555) 123-4567[PHONE]SSN (US) 123-45-6789[SSN]Credit cards 4111-1111-1111-1111[CREDIT_CARD]IPv4 addresses 192.168.1.1[IP_ADDRESS]API keys sk-xxx, api_xxx[API_KEY]Bearer tokens Bearer eyJ...[TOKEN]
CLI Usage
Mask PII in Ask Responses
# Enable PII masking in ask
memvid ask memory.mv2 --question "What's John's contact info?" --mask-pii
Without masking:
John can be reached at john.smith@acme.com or by phone at (555) 867-5309.
His SSN for payroll is 123-45-6789.
With --mask-pii:
John can be reached at [EMAIL] or by phone at [PHONE].
His SSN for payroll is [SSN].
PII in Search Results
# Search results with masking
memvid find memory.mv2 --query "contact information" --mask-pii
SDK Usage
Python
from memvid import use
mem = use( 'basic' , 'memory.mv2' )
# Ask with PII masking
response = mem.ask(
"What is the customer's contact information?" ,
mask_pii = True
)
print (response.answer)
# "Customer can be reached at [EMAIL] or [PHONE]"
# Check if content contains PII
from memvid import contains_pii, mask_pii
text = "Email me at test@example.com"
if contains_pii(text):
safe_text = mask_pii(text)
print (safe_text) # "Email me at [EMAIL]"
Node.js
import { use } from '@anthropics/memvid'
const mem = await use ( 'basic' , 'memory.mv2' )
// Ask with PII masking
const response = await mem . ask (
"What is the customer's contact information?" ,
{ maskPii: true }
)
console . log ( response . answer )
// "Customer can be reached at [EMAIL] or [PHONE]"
Utility Functions
Check for PII
from memvid import contains_pii
# Returns True if any PII detected
contains_pii( "Call 555-123-4567" ) # True
contains_pii( "Hello world" ) # False
Mask PII in Text
from memvid import mask_pii
original = """
Contact: john@example.com
Phone: (555) 123-4567
SSN: 123-45-6789
API Key: sk-abc123xyz
"""
masked = mask_pii(original)
print (masked)
Output:
Contact: [EMAIL]
Phone: [PHONE]
SSN: [SSN]
API Key: [API_KEY]
Get PII Locations
from memvid import detect_pii
text = "Email john@test.com or call 555-1234"
pii_items = detect_pii(text)
for item in pii_items:
print ( f "Type: { item.type } , Value: { item.value } , Position: { item.start } - { item.end } " )
# Type: email, Value: john@test.com, Position: 6-19
# Type: phone, Value: 555-1234, Position: 28-36
Use Cases
Customer Support
Mask customer data in AI responses:
# Support bot with PII protection
response = mem.ask(
ticket_content,
mask_pii = True # Don't expose customer PII
)
# Log safely
logger.info( f "Response: { response.answer } " ) # No PII in logs
Compliance (GDPR, HIPAA)
Redact PII before displaying or logging:
# Search medical records
results = mem.find( "patient symptoms" , mask_pii = True )
# Safe to display - no PHI exposed
for result in results:
print (result.snippet) # "[EMAIL]", "[PHONE]", "[SSN]" redacted
Development & Testing
Mask real data in development environments:
# Export masked data for dev/test
for frame in mem.timeline():
masked_content = mask_pii(frame.text)
dev_mem.put(masked_content)
Audit Logging
Log queries without exposing PII:
def search_with_audit ( query ):
results = mem.find(query)
# Log masked version
audit_log.info( f "Query: { mask_pii(query) } " )
audit_log.info( f "Results: { len (results) } " )
return results
Configuration
Default Behavior
PII masking is disabled by default . Enable it explicitly:
# CLI: use --mask-pii flag
memvid ask memory.mv2 -q "..." --mask-pii
# Python: mask_pii=True parameter
mem.ask( "..." , mask_pii = True )
Why Not Default?
Performance overhead for detection
Some use cases need raw data
Explicit opt-in for compliance clarity
Detection Patterns
Email Addresses
user@domain.com
user.name@subdomain.domain.co.uk
user+tag@domain.com
Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Phone Numbers
555-123-4567
(555) 123-4567
555.123.4567
+1 555 123 4567
15551234567
Supports US, UK, and international formats.
Social Security Numbers
123-45-6789
123 45 6789
123456789
US SSN format with common separators.
Credit Card Numbers
4111-1111-1111-1111
4111 1111 1111 1111
4111111111111111
Luhn-validated card number patterns.
IP Addresses
192.168.1.1
10.0.0.1
172.16.0.1
IPv4 addresses (IPv6 coming soon).
API Keys & Tokens
sk-abc123...
api_key_xxx...
Bearer eyJhbGciOiJ...
ghp_xxxxxxxxxxxx
Common API key and token prefixes.
Limitations
Not Detected
Some PII types are not currently detected:
Type Status Names ❌ Too many false positives Addresses ❌ Complex, locale-specific Dates of birth ❌ Ambiguous with other dates Medical record numbers ❌ Varies by institution Custom IDs ❌ Unknown format
False Positives
Some patterns may be incorrectly flagged:
# Might be flagged as phone
text = "Order #555-123-4567" # Could be order number
# Might be flagged as SSN
text = "Version 123-45-6789" # Could be version string
Context Insensitive
Detection is pattern-based, not context-aware:
# Both masked the same way
"Call me at 555-123-4567" # Real phone
"The code is 555-123-4567" # Not a phone, but still masked
Best Practices
1. Enable for User-Facing Output
# Always mask when displaying to users
response = mem.ask(question, mask_pii = True )
display_to_user(response.answer)
2. Keep Original for Internal Use
# Raw data for internal processing
results = mem.find(query) # No masking
# Masked for display
masked_results = [mask_pii(r.text) for r in results]
3. Mask Before Logging
import logging
def safe_log ( message ):
logging.info(mask_pii(message))
4. Combine with Encryption
For maximum protection:
# Encrypt at rest + mask on output
memvid lock sensitive.mv2 --out sensitive.mv2e
# When using:
memvid unlock sensitive.mv2e --out temp.mv2
memvid ask temp.mv2 -q "..." --mask-pii
memvid lock temp.mv2 --out sensitive.mv2e
rm temp.mv2
5. Test Your Patterns
Verify detection works for your data:
from memvid import detect_pii
# Test with your actual data patterns
test_data = [
"Contact: support@yourcompany.com" ,
"Phone: 1-800-YOUR-NUM" ,
"Account: CUST-12345" ,
]
for text in test_data:
pii = detect_pii(text)
print ( f "Found: { len (pii) } PII items in: { text } " )
Compliance Considerations
GDPR
PII masking helps with data minimization
Original data still stored (not anonymized)
Consider encryption + masking for full compliance
Document your data handling in privacy policy
HIPAA
Masks PHI in output (emails, phones)
Not a complete de-identification solution
Combine with permission-aware retrieval (ACL) and encryption
Consult compliance officer for healthcare data
SOC 2
Demonstrates data protection controls
Enable masking in production environments
Log that masking is applied
Include in security documentation
PII detection adds minimal overhead:
Operation Without Masking With Masking Ask (short) 150ms 155ms Ask (long) 500ms 520ms Find (10 results) 50ms 55ms
Overhead is ~5-10% depending on text length.
Future Features
Coming soon:
Custom PII patterns
Named entity recognition (names, addresses)
Per-type masking control
Reversible masking with keys
IPv6 address detection
International phone formats
Next Steps
Encryption Encrypt data at rest
Security FAQ Security and compliance questions