How It Works
Example:- Original:
"Contact [email protected] or call 555-123-4567" - Masked:
"Contact [EMAIL] or call [PHONE]"
- Original data preserved: Content stored without modification
- Searchable: You can search for emails, phones, etc.
- Masked on output: PII hidden in results and responses
- Query-time detection: No preprocessing required
Detected PII Types
| Type | Pattern | Masked As |
|---|---|---|
| Email addresses | [email protected] | [EMAIL] |
| Phone numbers | 555-123-4567, (555) 123-4567 | [PHONE] |
| SSN (US) | 123-45-6789 | [SSN] |
| Credit cards | 4111-1111-1111-1111 | [CREDIT_CARD] |
| IPv4 addresses | 192.168.1.1 | [IP_ADDRESS] |
| API keys | sk-xxx, api_xxx | [API_KEY] |
| Bearer tokens | Bearer eyJ... | [TOKEN] |
CLI Usage
Mask PII in Ask Responses
--mask-pii:
PII in Search Results
SDK Usage
Python
Node.js
Utility Functions
Check for PII
Mask PII in Text
Get PII Locations
Use Cases
Customer Support
Mask customer data in AI responses:Compliance (GDPR, HIPAA)
Redact PII before displaying or logging:Development & Testing
Mask real data in development environments:Audit Logging
Log queries without exposing PII:Configuration
Default Behavior
PII masking is disabled by default. Enable it explicitly:Why Not Default?
- Performance overhead for detection
- Some use cases need raw data
- Explicit opt-in for compliance clarity
Detection Patterns
Email Addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Phone Numbers
Social Security Numbers
Credit Card Numbers
IP Addresses
API Keys & Tokens
Limitations
Not Detected
Some PII types are not currently detected:| Type | Status |
|---|---|
| Names | ❌ Too many false positives |
| Addresses | ❌ Complex, locale-specific |
| Dates of birth | ❌ Ambiguous with other dates |
| Medical record numbers | ❌ Varies by institution |
| Custom IDs | ❌ Unknown format |
False Positives
Some patterns may be incorrectly flagged:Context Insensitive
Detection is pattern-based, not context-aware:Best Practices
1. Enable for User-Facing Output
2. Keep Original for Internal Use
3. Mask Before Logging
4. Combine with Encryption
For maximum protection:5. Test Your Patterns
Verify detection works for your data:Compliance Considerations
GDPR
- PII masking helps with data minimization
- Original data still stored (not anonymized)
- Consider encryption + masking for full compliance
- Document your data handling in privacy policy
HIPAA
- Masks PHI in output (emails, phones)
- Not a complete de-identification solution
- Combine with access controls and encryption
- Consult compliance officer for healthcare data
SOC 2
- Demonstrates data protection controls
- Enable masking in production environments
- Log that masking is applied
- Include in security documentation
Performance
PII detection adds minimal overhead:| Operation | Without Masking | With Masking |
|---|---|---|
| Ask (short) | 150ms | 155ms |
| Ask (long) | 500ms | 520ms |
| Find (10 results) | 50ms | 55ms |
Future Features
Coming soon:- Custom PII patterns
- Named entity recognition (names, addresses)
- Per-type masking control
- Reversible masking with keys
- IPv6 address detection
- International phone formats