PII Detection & Masking

Memvid can detect and mask Personally Identifiable Information (PII) in search results and ask responses. The original data remains searchable, but sensitive information is redacted in output.

How It Works

Example:

Original: "Contact john@example.com or call 555-123-4567"
Masked: "Contact [EMAIL] or call [PHONE]"

Key points:

Original data preserved: Content stored without modification
Searchable: You can search for emails, phones, etc.
Masked on output: PII hidden in results and responses
Query-time detection: No preprocessing required

Detected PII Types

Type	Pattern	Masked As
Email addresses	`user@domain.com`	`[EMAIL]`
Phone numbers	`555-123-4567`, `(555) 123-4567`	`[PHONE]`
SSN (US)	`123-45-6789`	`[SSN]`
Credit cards	`4111-1111-1111-1111`	`[CREDIT_CARD]`
IPv4 addresses	`192.168.1.1`	`[IP_ADDRESS]`
API keys	`sk-xxx`, `api_xxx`	`[API_KEY]`
Bearer tokens	`Bearer eyJ...`	`[TOKEN]`

CLI Usage

Mask PII in Ask Responses

# Enable PII masking in ask
memvid ask memory.mv2 --question "What's John's contact info?" --mask-pii

Without masking:

John can be reached at john.smith@acme.com or by phone at (555) 867-5309.
His SSN for payroll is 123-45-6789.

With --mask-pii:

John can be reached at [EMAIL] or by phone at [PHONE].
His SSN for payroll is [SSN].

PII in Search Results

# Search results with masking
memvid find memory.mv2 --query "contact information" --mask-pii

SDK Usage

Python

from memvid import use

mem = use('basic', 'memory.mv2')

# Ask with PII masking
response = mem.ask(
    "What is the customer's contact information?",
    mask_pii=True
)
print(response.answer)
# "Customer can be reached at [EMAIL] or [PHONE]"

# Check if content contains PII
from memvid import contains_pii, mask_pii

text = "Email me at test@example.com"
if contains_pii(text):
    safe_text = mask_pii(text)
    print(safe_text)  # "Email me at [EMAIL]"

Node.js

import { use } from '@anthropics/memvid'

const mem = await use('basic', 'memory.mv2')

// Ask with PII masking
const response = await mem.ask(
  "What is the customer's contact information?",
  { maskPii: true }
)
console.log(response.answer)
// "Customer can be reached at [EMAIL] or [PHONE]"

Utility Functions

Check for PII

from memvid import contains_pii

# Returns True if any PII detected
contains_pii("Call 555-123-4567")  # True
contains_pii("Hello world")        # False

Mask PII in Text

from memvid import mask_pii

original = """
Contact: john@example.com
Phone: (555) 123-4567
SSN: 123-45-6789
API Key: sk-abc123xyz
"""

masked = mask_pii(original)
print(masked)

Output:

Contact: [EMAIL]
Phone: [PHONE]
SSN: [SSN]
API Key: [API_KEY]

Get PII Locations

from memvid import detect_pii

text = "Email john@test.com or call 555-1234"
pii_items = detect_pii(text)

for item in pii_items:
    print(f"Type: {item.type}, Value: {item.value}, Position: {item.start}-{item.end}")

# Type: email, Value: john@test.com, Position: 6-19
# Type: phone, Value: 555-1234, Position: 28-36

Use Cases

Customer Support

Mask customer data in AI responses:

# Support bot with PII protection
response = mem.ask(
    ticket_content,
    mask_pii=True  # Don't expose customer PII
)

# Log safely
logger.info(f"Response: {response.answer}")  # No PII in logs

Redact PII before displaying or logging:

# Search medical records
results = mem.find("patient symptoms", mask_pii=True)

# Safe to display - no PHI exposed
for result in results:
    print(result.snippet)  # "[EMAIL]", "[PHONE]", "[SSN]" redacted

Development & Testing

Mask real data in development environments:

# Export masked data for dev/test
for frame in mem.timeline():
    masked_content = mask_pii(frame.text)
    dev_mem.put(masked_content)

Audit Logging

Log queries without exposing PII:

def search_with_audit(query):
    results = mem.find(query)

    # Log masked version
    audit_log.info(f"Query: {mask_pii(query)}")
    audit_log.info(f"Results: {len(results)}")

    return results

Configuration

Default Behavior

PII masking is disabled by default. Enable it explicitly:

# CLI: use --mask-pii flag
memvid ask memory.mv2 -q "..." --mask-pii

# Python: mask_pii=True parameter
mem.ask("...", mask_pii=True)

Why Not Default?

Performance overhead for detection
Some use cases need raw data
Explicit opt-in for compliance clarity

Detection Patterns

Email Addresses

user@domain.com
user.name@subdomain.domain.co.uk
user+tag@domain.com

Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Phone Numbers

555-123-4567
(555) 123-4567
555.123.4567
+1 555 123 4567
15551234567

Supports US, UK, and international formats.

123-45-6789
123 45 6789
123456789

US SSN format with common separators.

Credit Card Numbers

4111-1111-1111-1111
4111 1111 1111 1111
4111111111111111

Luhn-validated card number patterns.

IP Addresses

168.1.1
0.0.1
16.0.1

IPv4 addresses (IPv6 coming soon).

API Keys & Tokens

sk-abc123...
api_key_xxx...
Bearer eyJhbGciOiJ...
ghp_xxxxxxxxxxxx

Common API key and token prefixes.

Limitations

Not Detected

Some PII types are not currently detected:

Type	Status
Names	❌ Too many false positives
Addresses	❌ Complex, locale-specific
Dates of birth	❌ Ambiguous with other dates
Medical record numbers	❌ Varies by institution
Custom IDs	❌ Unknown format

False Positives

Some patterns may be incorrectly flagged:

# Might be flagged as phone
text = "Order #555-123-4567"  # Could be order number

# Might be flagged as SSN
text = "Version 123-45-6789"  # Could be version string

Context Insensitive

Detection is pattern-based, not context-aware:

# Both masked the same way
"Call me at 555-123-4567"  # Real phone
"The code is 555-123-4567"  # Not a phone, but still masked

Best Practices

1. Enable for User-Facing Output

# Always mask when displaying to users
response = mem.ask(question, mask_pii=True)
display_to_user(response.answer)

2. Keep Original for Internal Use

# Raw data for internal processing
results = mem.find(query)  # No masking

# Masked for display
masked_results = [mask_pii(r.text) for r in results]

3. Mask Before Logging

import logging

def safe_log(message):
    logging.info(mask_pii(message))

4. Combine with Encryption

For maximum protection:

# Encrypt at rest + mask on output
memvid lock sensitive.mv2 --out sensitive.mv2e

# When using:
memvid unlock sensitive.mv2e --out temp.mv2
memvid ask temp.mv2 -q "..." --mask-pii
memvid lock temp.mv2 --out sensitive.mv2e
rm temp.mv2

5. Test Your Patterns

Verify detection works for your data:

from memvid import detect_pii

# Test with your actual data patterns
test_data = [
    "Contact: support@yourcompany.com",
    "Phone: 1-800-YOUR-NUM",
    "Account: CUST-12345",
]

for text in test_data:
    pii = detect_pii(text)
    print(f"Found: {len(pii)} PII items in: {text}")

Compliance Considerations

PII masking helps with data minimization
Original data still stored (not anonymized)
Consider encryption + masking for full compliance
Document your data handling in privacy policy

HIPAA

Masks PHI in output (emails, phones)
Not a complete de-identification solution
Combine with permission-aware retrieval (ACL) and encryption
Consult compliance officer for healthcare data

SOC 2

Demonstrates data protection controls
Enable masking in production environments
Log that masking is applied
Include in security documentation

Performance

PII detection adds minimal overhead:

Operation	Without Masking	With Masking
Ask (short)	150ms	155ms
Ask (long)	500ms	520ms
Find (10 results)	50ms	55ms

Overhead is ~5-10% depending on text length.

Future Features

Coming soon:

Custom PII patterns
Named entity recognition (names, addresses)
Per-type masking control
Reversible masking with keys
IPv6 address detection
International phone formats

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​How It Works

​Detected PII Types

​CLI Usage

​Mask PII in Ask Responses

​PII in Search Results

​SDK Usage

​Python

​Node.js

​Utility Functions

​Check for PII

​Mask PII in Text

​Get PII Locations

​Use Cases

​Customer Support

​Compliance (GDPR, HIPAA)

​Development & Testing

​Audit Logging

​Configuration

​Default Behavior

​Why Not Default?

​Detection Patterns

​Email Addresses

​Phone Numbers

​Social Security Numbers

​Credit Card Numbers

​IP Addresses

​API Keys & Tokens

​Limitations

​Not Detected

​False Positives

​Context Insensitive

​Best Practices

​1. Enable for User-Facing Output

​2. Keep Original for Internal Use

​3. Mask Before Logging

​4. Combine with Encryption

​5. Test Your Patterns

​Compliance Considerations

​GDPR

​HIPAA

​SOC 2

​Performance

​Future Features

​Next Steps

Encryption

Security FAQ

How It Works

Detected PII Types

CLI Usage

Mask PII in Ask Responses

PII in Search Results

SDK Usage

Python

Node.js

Utility Functions

Check for PII

Mask PII in Text

Get PII Locations

Use Cases

Customer Support

Compliance (GDPR, HIPAA)

Development & Testing

Audit Logging

Configuration

Default Behavior

Why Not Default?

Detection Patterns

Email Addresses

Phone Numbers

Social Security Numbers

Credit Card Numbers

IP Addresses

API Keys & Tokens

Limitations

Not Detected

False Positives

Context Insensitive

Best Practices

1. Enable for User-Facing Output

2. Keep Original for Internal Use

3. Mask Before Logging

4. Combine with Encryption

5. Test Your Patterns

Compliance Considerations

GDPR

HIPAA

SOC 2

Performance

Future Features

Next Steps