Comprehensive PII detection & anonymization
320+ entity types, 48 languages, 7 anonymization methods, and compliance-ready presets — built for enterprise requirements. No training required — start anonymizing in minutes.
Two engines for comprehensive detection
Choose the right detection engine for your data type, or combine both for maximum accuracy.
NLP Engine
Context-aware detection powered by Microsoft Presidio. Understands language semantics and entity relationships.
- 320+ entity types
- 48 languages supported
- Context-aware detection
- Best for: documents, emails, chat logs
Pattern Engine
Ultra-fast regex-based detection with checksum validation. Format-based recognition for structured data.
- 40+ pattern recognizers
- Checksum validation (Luhn, IBAN)
- Sub-millisecond processing
- Best for: transactions, logs, structured data
Use Hybrid Mode for maximum coverage
Combine both engines to detect structured patterns AND contextual entities in a single pass.
When to use which engine
| Use Case | Recommended Engine | Why |
|---|---|---|
| AI Chat Protection | NLP | Understands conversational context |
| Payment Processing | Pattern | Luhn checksum validates card numbers |
| Legal Document Review | NLP | Names and entities in prose |
| Transaction Logs | Pattern | Structured data, fast processing |
| Healthcare Records | NLP | Medical terminology understanding |
| Mixed Data Pipelines | Hybrid | Maximum coverage, both engines |
Intelligent PII detection
Pattern-based and NLP-powered detection identifies personal data across text and documents with confidence scoring.
320+ Entity Types
Detection Features
- Confidence Scoring — Each detection includes a confidence score for review workflows
- Custom Entity Support — Define domain-specific patterns for your data reality
- Entity Grouping — Organize entities into logical groups for policy application
- Context-Aware — NLP-powered detection understands context, not just patterns
- Format Preservation — Maintains document structure during detection
- Checksum Validation — Luhn, MOD-97, and format rules reduce false positives for structured data
Show Detection Output Example
Seven anonymization methods
Choose the method that matches your risk model and data utility requirements. All seven methods are available across API, Desktop App, and Office Add-in.
Replace
Substitute with synthetic data that maintains format and realism
Redact
Remove completely with placeholder indicating entity type
Mask
Partially obscure while preserving format recognition
Hash
One-way SHA-256 hash for consistent pseudonymization
Encrypt
AES-256-GCM symmetric encryption with controlled decryption
Asymmetric Encrypt
RSA-4096 public-key encryption for zero-trust data sharing
Keep
Preserve original value — detect only, no transformation applied
Image Anonymization
OCR-powered text detection in images. Extract, detect, and redact PII directly from image files.
Supported Formats
How It Works
- Tesseract OCR — Industry-standard optical character recognition
- 48 Languages — Same language support as text processing
- Presidio NLP — Microsoft Presidio analyzes extracted text
- Pixel Mapping — PII locations mapped to exact coordinates
- Solid Redaction — Covered with configurable color rectangles
Detection Capabilities
25+ entity types detected via OCR-extracted text:
Important Limitation
Image anonymization detects text that OCR can read. It does not detect faces, license plates, QR codes, or handwriting. For best results, use high-resolution images with clear, printed text.
Processing time: 3–20 seconds depending on image size and complexity.
Decryption & Re-identification
Need the original data back? The Encrypt method enables controlled, key-based decryption for authorized re-identification.
How Reversible Encryption Works
Comparison: Irreversible vs Reversible
| Method | Reversible | Use Case |
|---|---|---|
| Redact | ✗ | Permanent removal |
| Replace | ✗ | Synthetic substitution |
| Mask | ✗ | Partial visibility |
| Hash (SHA-256) | ✗ | Consistent pseudonymization |
| Encrypt (AES-256-GCM) | ✓ | Re-identification required |
When to Use Encryption
- Data Processing Agreements — When processors need to return original data
- Research & Analytics — Anonymize for analysis, re-identify for follow-up
- Temporary Redaction — Share documents safely, restore when authorized
- Legal Discovery — Protect PII during review, reveal when legally required
- Cross-Border Transfer — Encrypt for transit, decrypt at destination
ENCRYPTION DETAILS
- Algorithm: AES-256-GCM (authenticated encryption)
- Key Derivation: PBKDF2 with personal master key
- Key Storage: Zero-knowledge — only you hold the key
- Token Format: Base64-encoded ciphertext with auth tag
- Decryption: API endpoint or Desktop App with key
Note: Without the encryption key, data cannot be recovered. We recommend secure key backup using the 24-word recovery phrase.
48 languages with NLP processing
Comprehensive language support including right-to-left scripts and mixed-language document processing.
Language Features
- Automatic Detection — Language is detected automatically, no configuration needed
- RTL Support — Full support for Arabic, Hebrew, Persian, and Urdu
- Mixed Documents — Handle documents containing multiple languages
- Lazy-Loaded Models — Language models load on-demand for efficiency
- NLP-Powered — Natural language processing for context-aware detection
Supported Languages
Connect to your workflows
Multiple integration points to embed anonymization into your existing processes.
REST API
RESTful endpoints with JWT authentication and rate limiting for automation and pipelines.
MCP Server
Native Claude Desktop integration plus HTTP for Cursor & VS Code. 6 operators, entity groups, personal encryption keys.
Office Add-in
Microsoft Word integration with real-time detection, one-click anonymization, and format preservation.
Desktop App
Local file processing for Windows, macOS, Linux. Military-grade vault encryption and offline history.
Chrome Extension
Protect privacy on ChatGPT, Claude, and Gemini with real-time PII interception and response de-anonymization.
Batch Processing
Multi-document upload with parallel processing, progress tracking, and bulk downloads (up to 100 docs).
Ready-to-use compliance presets
Pre-configured presets for common regulatory requirements. Customize or create your own.
GDPR Preset
Configured for EU General Data Protection Regulation compliance with focus on personal data categories defined in Art. 4.
- Personal identifiers
- Contact information
- Location data
- Online identifiers
HIPAA Preset
Configured for US Health Insurance Portability and Accountability Act with 18 PHI identifiers.
- Patient identifiers
- Medical record numbers
- Health plan IDs
- Biometric identifiers
PCI-DSS Preset
Configured for Payment Card Industry Data Security Standard with focus on cardholder data.
- Primary account numbers
- Cardholder names
- Expiration dates
- Service codes
Zero-Knowledge: We never see your data
Your text is processed and returned — never stored, logged, or accessible to us. Even our own team cannot see what you're processing. This is the hammer.
Built with security-first principles and modern cryptographic standards:
- No text storage — Your content passes through, never persists
- No logging of content — We log events, never data
- Argon2id + XChaCha20-Poly1305 — State-of-the-art encryption
- 24-Word Recovery Phrase — BIP39-style recovery you control
- Personal Encryption Keys — Your keys, your control
From signup to production in hours, not months
Unlike solutions that require weeks of engineering and steep learning curves, our managed service gets you anonymizing data the same day. Upload files, click Anonymize, download results — no technical expertise needed.
SaaS (Online Hosted)
Sign up, get API key, start processing. No infrastructure setup required.
Managed Private
Dedicated instance provisioned by our team. Your data, your environment, our management.
Self-Managed
Deploy on your infrastructure. Full control, full isolation, our software and support.
Self-Hosted Option Available
For organizations requiring complete data sovereignty, our self-managed package lets you run the entire anonymization stack on your own infrastructure — on-premises or in your private cloud. Same features, same updates, zero external data transfer.
Learn About Self-ManagedProcess any document type
Our detection and anonymization works across all common document formats. Upload via API, Desktop App, or batch processing.
Complete audit trail
Every anonymization operation is logged for compliance and governance requirements — without ever storing your actual data.
- Processing logs — What was processed, when, by whom
- Entity counts — Number and types of entities detected
- Method applied — Which anonymization method was used per entity
- Zero content logging — We log events, never your data
- Export capability — Download logs for external audit systems
Try These Features on Our Platforms
Each platform showcases different aspects of our technology. Choose the one that matches your use case.
Enterprise API Access
Full REST API with 320+ entity types, batch processing, and comprehensive documentation.
Try anonymize.today ↗Image & OCR Anonymization
Tesseract OCR + Presidio NLP for processing scanned documents and photos with embedded PII.
Try blurgate.legal ↗AI Chat Protection
Chrome Extension for real-time PII masking before sharing with ChatGPT, Claude, or Gemini.
Try anonymize.live ↗48 Languages + RTL Support
Multilingual detection including Arabic, Hebrew, Persian, and Urdu with right-to-left rendering.
Try anonymize.world ↗Education Sector (FERPA + GDPR)
Student ID detection, LibreOffice Add-in, and education-specific compliance features.
Try anonymize.education ↗Legal Documentation Portal
GDPR compliance documentation, API reference, and law firm-focused workflows.
Try anonym.legal ↗Ready to explore the full platform?
See how anonymize.solutions can protect your data with comprehensive detection and anonymization capabilities.