320+
Entity Types
Names, emails, phones, IBANs, SSNs, and more
48
Languages
Including RTL and mixed-language support
7
Methods
Replace, Redact, Mask, Hash, Encrypt +2
6
Integrations
API, MCP, Office, Desktop, Browser, Batch

Two engines for comprehensive detection

Choose the right detection engine for your data type, or combine both for maximum accuracy.

DEFAULT

NLP Engine

Context-aware detection powered by Microsoft Presidio. Understands language semantics and entity relationships.

  • 320+ entity types
  • 48 languages supported
  • Context-aware detection
  • Best for: documents, emails, chat logs
NEW

Pattern Engine

Ultra-fast regex-based detection with checksum validation. Format-based recognition for structured data.

  • 40+ pattern recognizers
  • Checksum validation (Luhn, IBAN)
  • Sub-millisecond processing
  • Best for: transactions, logs, structured data
PRO TIP

Use Hybrid Mode for maximum coverage

Combine both engines to detect structured patterns AND contextual entities in a single pass.

mode=hybrid

When to use which engine

Use Case Recommended Engine Why
AI Chat Protection NLP Understands conversational context
Payment Processing Pattern Luhn checksum validates card numbers
Legal Document Review NLP Names and entities in prose
Transaction Logs Pattern Structured data, fast processing
Healthcare Records NLP Medical terminology understanding
Mixed Data Pipelines Hybrid Maximum coverage, both engines

Intelligent PII detection

Pattern-based and NLP-powered detection identifies personal data across text and documents with confidence scoring.

Deterministic
NLP + Pattern engines, not LLMs
EU Infrastructure
100% German/EU hosted
High Accuracy
317 Regex Recognizers
Zero-Knowledge
We never see your data

320+ Entity Types

Names
Email Addresses
Phone Numbers
Credit Cards
SSN / Tax IDs
IBAN / Bank
IP Addresses
Dates of Birth
Addresses
Passport IDs
Driver License
+ Custom Entities

Detection Features

  • Confidence Scoring — Each detection includes a confidence score for review workflows
  • Custom Entity Support — Define domain-specific patterns for your data reality
  • Entity Grouping — Organize entities into logical groups for policy application
  • Context-Aware — NLP-powered detection understands context, not just patterns
  • Format Preservation — Maintains document structure during detection
  • Checksum Validation — Luhn, MOD-97, and format rules reduce false positives for structured data
Show Detection Output Example
// Sample detection output (JSON) { "entities": [ { "type": "PERSON", "text": "John Smith", "start": 0, "end": 10, "confidence": 0.95, "engine": "NLP" }, { "type": "EMAIL_ADDRESS", "text": "john.smith@example.com", "start": 23, "end": 46, "confidence": 1.0, "engine": "Pattern" } ] }

Seven anonymization methods

Choose the method that matches your risk model and data utility requirements. All seven methods are available across API, Desktop App, and Office Add-in.

Replace

Substitute with synthetic data that maintains format and realism

John Doe → Max Schmidt

Redact

Remove completely with placeholder indicating entity type

John Doe → [NAME]

Mask

Partially obscure while preserving format recognition

john@mail.com → j***@***.com

Hash

One-way SHA-256 hash for consistent pseudonymization

John → a8f5f1...

Encrypt

AES-256-GCM symmetric encryption with controlled decryption

Reversible with key

Asymmetric Encrypt

RSA-4096 public-key encryption for zero-trust data sharing

RSA-4096 / public key

Keep

Preserve original value — detect only, no transformation applied

Detect & flag only

Image Anonymization

OCR-powered text detection in images. Extract, detect, and redact PII directly from image files.

Supported Formats

JPEG/JPG
PNG
TIFF
BMP
WebP
GIF

How It Works

  • Tesseract OCR — Industry-standard optical character recognition
  • 48 Languages — Same language support as text processing
  • Presidio NLP — Microsoft Presidio analyzes extracted text
  • Pixel Mapping — PII locations mapped to exact coordinates
  • Solid Redaction — Covered with configurable color rectangles

Detection Capabilities

25+ entity types detected via OCR-extracted text:

Names Emails Phones Credit Cards IBANs SSNs Addresses Dates IPs Passports

Important Limitation

Image anonymization detects text that OCR can read. It does not detect faces, license plates, QR codes, or handwriting. For best results, use high-resolution images with clear, printed text.

Processing time: 3–20 seconds depending on image size and complexity.

OPTIONAL ADD-ON FOR ALL PACKAGES

Decryption & Re-identification

Need the original data back? The Encrypt method enables controlled, key-based decryption for authorized re-identification.

How Reversible Encryption Works

1. ENCRYPT
PII encrypted with AES-256-GCM using your personal key
2. STORE
Encrypted tokens replace original values in documents
3. DECRYPT
Authorized users with the key can restore original values

Comparison: Irreversible vs Reversible

Method Reversible Use Case
Redact Permanent removal
Replace Synthetic substitution
Mask Partial visibility
Hash (SHA-256) Consistent pseudonymization
Encrypt (AES-256-GCM) Re-identification required

When to Use Encryption

  • Data Processing Agreements — When processors need to return original data
  • Research & Analytics — Anonymize for analysis, re-identify for follow-up
  • Temporary Redaction — Share documents safely, restore when authorized
  • Legal Discovery — Protect PII during review, reveal when legally required
  • Cross-Border Transfer — Encrypt for transit, decrypt at destination

ENCRYPTION DETAILS

  • Algorithm: AES-256-GCM (authenticated encryption)
  • Key Derivation: PBKDF2 with personal master key
  • Key Storage: Zero-knowledge — only you hold the key
  • Token Format: Base64-encoded ciphertext with auth tag
  • Decryption: API endpoint or Desktop App with key

Note: Without the encryption key, data cannot be recovered. We recommend secure key backup using the 24-word recovery phrase.

48 languages with NLP processing

Comprehensive language support including right-to-left scripts and mixed-language document processing.

Language Features

  • Automatic Detection — Language is detected automatically, no configuration needed
  • RTL Support — Full support for Arabic, Hebrew, Persian, and Urdu
  • Mixed Documents — Handle documents containing multiple languages
  • Lazy-Loaded Models — Language models load on-demand for efficiency
  • NLP-Powered — Natural language processing for context-aware detection

Supported Languages

English German French Spanish Italian Portuguese Dutch Polish Russian Chinese Japanese Korean Arabic (RTL) Hebrew (RTL) + 34 more →

View complete language list

Connect to your workflows

Multiple integration points to embed anonymization into your existing processes.

REST API

RESTful endpoints with JWT authentication and rate limiting for automation and pipelines.

MCP Server

Native Claude Desktop integration plus HTTP for Cursor & VS Code. 6 operators, entity groups, personal encryption keys.

Office Add-in

Microsoft Word integration with real-time detection, one-click anonymization, and format preservation.

Desktop App

Local file processing for Windows, macOS, Linux. Military-grade vault encryption and offline history.

Chrome Extension

Protect privacy on ChatGPT, Claude, and Gemini with real-time PII interception and response de-anonymization.

Batch Processing

Multi-document upload with parallel processing, progress tracking, and bulk downloads (up to 100 docs).

View Integration Details

Ready-to-use compliance presets

Pre-configured presets for common regulatory requirements. Customize or create your own.

GDPR Preset

Configured for EU General Data Protection Regulation compliance with focus on personal data categories defined in Art. 4.

  • Personal identifiers
  • Contact information
  • Location data
  • Online identifiers

HIPAA Preset

Configured for US Health Insurance Portability and Accountability Act with 18 PHI identifiers.

  • Patient identifiers
  • Medical record numbers
  • Health plan IDs
  • Biometric identifiers

PCI-DSS Preset

Configured for Payment Card Industry Data Security Standard with focus on cardholder data.

  • Primary account numbers
  • Cardholder names
  • Expiration dates
  • Service codes

Zero-Knowledge: We never see your data

Your text is processed and returned — never stored, logged, or accessible to us. Even our own team cannot see what you're processing. This is the hammer.

Built with security-first principles and modern cryptographic standards:

  • No text storage — Your content passes through, never persists
  • No logging of content — We log events, never data
  • Argon2id + XChaCha20-Poly1305 — State-of-the-art encryption
  • 24-Word Recovery Phrase — BIP39-style recovery you control
  • Personal Encryption Keys — Your keys, your control
View Security Details
ZERO-KNOWLEDGE LAYER Argon2id + XChaCha20-Poly1305 DATA ENCRYPTION AES-256-GCM + SHA-256 RECOVERY 24-Word Phrase 2FA TOTP / Enterprise SSO INFRASTRUCTURE EU Hosting Options

From signup to production in hours, not months

Unlike solutions that require weeks of engineering and steep learning curves, our managed service gets you anonymizing data the same day. Upload files, click Anonymize, download results — no technical expertise needed.

MANAGED

Managed Private

Days

Dedicated instance provisioned by our team. Your data, your environment, our management.

SELF-HOSTED

Self-Managed

1–2 Weeks

Deploy on your infrastructure. Full control, full isolation, our software and support.

Self-Hosted Option Available

For organizations requiring complete data sovereignty, our self-managed package lets you run the entire anonymization stack on your own infrastructure — on-premises or in your private cloud. Same features, same updates, zero external data transfer.

Learn About Self-Managed

Process any document type

Our detection and anonymization works across all common document formats. Upload via API, Desktop App, or batch processing.

PDF DOCX XLSX TXT CSV JSON PPTX HTML XML Images (OCR)

Complete audit trail

Every anonymization operation is logged for compliance and governance requirements — without ever storing your actual data.

  • Processing logs — What was processed, when, by whom
  • Entity counts — Number and types of entities detected
  • Method applied — Which anonymization method was used per entity
  • Zero content logging — We log events, never your data
  • Export capability — Download logs for external audit systems

Try These Features on Our Platforms

Each platform showcases different aspects of our technology. Choose the one that matches your use case.

Enterprise API Access

Full REST API with 320+ entity types, batch processing, and comprehensive documentation.

Try anonymize.today ↗

Image & OCR Anonymization

Tesseract OCR + Presidio NLP for processing scanned documents and photos with embedded PII.

Try blurgate.legal ↗

AI Chat Protection

Chrome Extension for real-time PII masking before sharing with ChatGPT, Claude, or Gemini.

Try anonymize.live ↗

48 Languages + RTL Support

Multilingual detection including Arabic, Hebrew, Persian, and Urdu with right-to-left rendering.

Try anonymize.world ↗

Education Sector (FERPA + GDPR)

Student ID detection, LibreOffice Add-in, and education-specific compliance features.

Try anonymize.education ↗

Legal Documentation Portal

GDPR compliance documentation, API reference, and law firm-focused workflows.

Try anonym.legal ↗
See All 11 Platforms →

Ready to explore the full platform?

See how anonymize.solutions can protect your data with comprehensive detection and anonymization capabilities.