Skip to main content
Memvid can detect and mask Personally Identifiable Information (PII) in search results and ask responses. The original data remains searchable, but sensitive information is redacted in output.

How It Works

Example:
  • Original: "Contact [email protected] or call 555-123-4567"
  • Masked: "Contact [EMAIL] or call [PHONE]"
Key points:
  • Original data preserved: Content stored without modification
  • Searchable: You can search for emails, phones, etc.
  • Masked on output: PII hidden in results and responses
  • Query-time detection: No preprocessing required

Detected PII Types

TypePatternMasked As
Email addresses[email protected][EMAIL]
Phone numbers555-123-4567, (555) 123-4567[PHONE]
SSN (US)123-45-6789[SSN]
Credit cards4111-1111-1111-1111[CREDIT_CARD]
IPv4 addresses192.168.1.1[IP_ADDRESS]
API keyssk-xxx, api_xxx[API_KEY]
Bearer tokensBearer eyJ...[TOKEN]

CLI Usage

Mask PII in Ask Responses

# Enable PII masking in ask
memvid ask memory.mv2 --question "What's John's contact info?" --mask-pii
Without masking:
John can be reached at [email protected] or by phone at (555) 867-5309.
His SSN for payroll is 123-45-6789.
With --mask-pii:
John can be reached at [EMAIL] or by phone at [PHONE].
His SSN for payroll is [SSN].

PII in Search Results

# Search results with masking
memvid find memory.mv2 --query "contact information" --mask-pii

SDK Usage

Python

from memvid import use

mem = use('basic', 'memory.mv2')

# Ask with PII masking
response = mem.ask(
    "What is the customer's contact information?",
    mask_pii=True
)
print(response.answer)
# "Customer can be reached at [EMAIL] or [PHONE]"

# Check if content contains PII
from memvid import contains_pii, mask_pii

text = "Email me at [email protected]"
if contains_pii(text):
    safe_text = mask_pii(text)
    print(safe_text)  # "Email me at [EMAIL]"

Node.js

import { use } from '@anthropics/memvid'

const mem = await use('basic', 'memory.mv2')

// Ask with PII masking
const response = await mem.ask(
  "What is the customer's contact information?",
  { maskPii: true }
)
console.log(response.answer)
// "Customer can be reached at [EMAIL] or [PHONE]"

Utility Functions

Check for PII

from memvid import contains_pii

# Returns True if any PII detected
contains_pii("Call 555-123-4567")  # True
contains_pii("Hello world")        # False

Mask PII in Text

from memvid import mask_pii

original = """
Contact: [email protected]
Phone: (555) 123-4567
SSN: 123-45-6789
API Key: sk-abc123xyz
"""

masked = mask_pii(original)
print(masked)
Output:
Contact: [EMAIL]
Phone: [PHONE]
SSN: [SSN]
API Key: [API_KEY]

Get PII Locations

from memvid import detect_pii

text = "Email [email protected] or call 555-1234"
pii_items = detect_pii(text)

for item in pii_items:
    print(f"Type: {item.type}, Value: {item.value}, Position: {item.start}-{item.end}")

# Type: email, Value: [email protected], Position: 6-19
# Type: phone, Value: 555-1234, Position: 28-36

Use Cases

Customer Support

Mask customer data in AI responses:
# Support bot with PII protection
response = mem.ask(
    ticket_content,
    mask_pii=True  # Don't expose customer PII
)

# Log safely
logger.info(f"Response: {response.answer}")  # No PII in logs

Compliance (GDPR, HIPAA)

Redact PII before displaying or logging:
# Search medical records
results = mem.find("patient symptoms", mask_pii=True)

# Safe to display - no PHI exposed
for result in results:
    print(result.snippet)  # "[EMAIL]", "[PHONE]", "[SSN]" redacted

Development & Testing

Mask real data in development environments:
# Export masked data for dev/test
for frame in mem.timeline():
    masked_content = mask_pii(frame.text)
    dev_mem.put(masked_content)

Audit Logging

Log queries without exposing PII:
def search_with_audit(query):
    results = mem.find(query)

    # Log masked version
    audit_log.info(f"Query: {mask_pii(query)}")
    audit_log.info(f"Results: {len(results)}")

    return results

Configuration

Default Behavior

PII masking is disabled by default. Enable it explicitly:
# CLI: use --mask-pii flag
memvid ask memory.mv2 -q "..." --mask-pii
# Python: mask_pii=True parameter
mem.ask("...", mask_pii=True)

Why Not Default?

  • Performance overhead for detection
  • Some use cases need raw data
  • Explicit opt-in for compliance clarity

Detection Patterns

Email Addresses

Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Phone Numbers

555-123-4567
(555) 123-4567
555.123.4567
+1 555 123 4567
15551234567
Supports US, UK, and international formats.

Social Security Numbers

123-45-6789
123 45 6789
123456789
US SSN format with common separators.

Credit Card Numbers

4111-1111-1111-1111
4111 1111 1111 1111
4111111111111111
Luhn-validated card number patterns.

IP Addresses

192.168.1.1
10.0.0.1
172.16.0.1
IPv4 addresses (IPv6 coming soon).

API Keys & Tokens

sk-abc123...
api_key_xxx...
Bearer eyJhbGciOiJ...
ghp_xxxxxxxxxxxx
Common API key and token prefixes.

Limitations

Not Detected

Some PII types are not currently detected:
TypeStatus
Names❌ Too many false positives
Addresses❌ Complex, locale-specific
Dates of birth❌ Ambiguous with other dates
Medical record numbers❌ Varies by institution
Custom IDs❌ Unknown format

False Positives

Some patterns may be incorrectly flagged:
# Might be flagged as phone
text = "Order #555-123-4567"  # Could be order number

# Might be flagged as SSN
text = "Version 123-45-6789"  # Could be version string

Context Insensitive

Detection is pattern-based, not context-aware:
# Both masked the same way
"Call me at 555-123-4567"  # Real phone
"The code is 555-123-4567"  # Not a phone, but still masked

Best Practices

1. Enable for User-Facing Output

# Always mask when displaying to users
response = mem.ask(question, mask_pii=True)
display_to_user(response.answer)

2. Keep Original for Internal Use

# Raw data for internal processing
results = mem.find(query)  # No masking

# Masked for display
masked_results = [mask_pii(r.text) for r in results]

3. Mask Before Logging

import logging

def safe_log(message):
    logging.info(mask_pii(message))

4. Combine with Encryption

For maximum protection:
# Encrypt at rest + mask on output
memvid lock sensitive.mv2 --out sensitive.mv2e

# When using:
memvid unlock sensitive.mv2e --out temp.mv2
memvid ask temp.mv2 -q "..." --mask-pii
memvid lock temp.mv2 --out sensitive.mv2e
rm temp.mv2

5. Test Your Patterns

Verify detection works for your data:
from memvid import detect_pii

# Test with your actual data patterns
test_data = [
    "Contact: [email protected]",
    "Phone: 1-800-YOUR-NUM",
    "Account: CUST-12345",
]

for text in test_data:
    pii = detect_pii(text)
    print(f"Found: {len(pii)} PII items in: {text}")

Compliance Considerations

GDPR

  • PII masking helps with data minimization
  • Original data still stored (not anonymized)
  • Consider encryption + masking for full compliance
  • Document your data handling in privacy policy

HIPAA

  • Masks PHI in output (emails, phones)
  • Not a complete de-identification solution
  • Combine with access controls and encryption
  • Consult compliance officer for healthcare data

SOC 2

  • Demonstrates data protection controls
  • Enable masking in production environments
  • Log that masking is applied
  • Include in security documentation

Performance

PII detection adds minimal overhead:
OperationWithout MaskingWith Masking
Ask (short)150ms155ms
Ask (long)500ms520ms
Find (10 results)50ms55ms
Overhead is ~5-10% depending on text length.

Future Features

Coming soon:
  • Custom PII patterns
  • Named entity recognition (names, addresses)
  • Per-type masking control
  • Reversible masking with keys
  • IPv6 address detection
  • International phone formats

Next Steps