PII Detection Features — 320+ Entity Types, 48 Languages

320+

Entity Types

Names, emails, phones, IBANs, SSNs, and more

Languages

Including RTL and mixed-language support

Methods

Replace, Redact, Mask, Hash, Encrypt +2

Integrations

API, MCP, Office, Desktop, Browser, Batch

Detection Engines

Two engines for comprehensive detection

Choose the right detection engine for your data type, or combine both for maximum accuracy.

DEFAULT

NLP Engine

Context-aware detection powered by Microsoft Presidio. Understands language semantics and entity relationships.

320+ entity types
48 languages supported
Context-aware detection
Best for: documents, emails, chat logs

NEW

Pattern Engine

Ultra-fast regex-based detection with checksum validation. Format-based recognition for structured data.

40+ pattern recognizers
Checksum validation (Luhn, IBAN)
Sub-millisecond processing
Best for: transactions, logs, structured data

PRO TIP

Use Hybrid Mode for maximum coverage

Combine both engines to detect structured patterns AND contextual entities in a single pass.

                mode=hybrid
            

When to use which engine

Use Case	Recommended Engine	Why
AI Chat Protection	NLP	Understands conversational context
Payment Processing	Pattern	Luhn checksum validates card numbers
Legal Document Review	NLP	Names and entities in prose
Transaction Logs	Pattern	Structured data, fast processing
Healthcare Records	NLP	Medical terminology understanding
Mixed Data Pipelines	Hybrid	Maximum coverage, both engines

Detection

Intelligent PII detection

Pattern-based and NLP-powered detection identifies personal data across text and documents with confidence scoring.

Deterministic

NLP + Pattern engines, not LLMs

EU Infrastructure

100% German/EU hosted

High Accuracy

317 Regex Recognizers

Zero-Knowledge

We never see your data

320+ Entity Types

Names

Email Addresses

Phone Numbers

Credit Cards

SSN / Tax IDs

IBAN / Bank

IP Addresses

Dates of Birth

Addresses

Passport IDs

Driver License

+ Custom Entities

Detection Features

Confidence Scoring — Each detection includes a confidence score for review workflows
Custom Entity Support — Define domain-specific patterns for your data reality
Entity Grouping — Organize entities into logical groups for policy application
Context-Aware — NLP-powered detection understands context, not just patterns
Format Preservation — Maintains document structure during detection
Checksum Validation — Luhn, MOD-97, and format rules reduce false positives for structured data

Show Detection Output Example

// Sample detection output (JSON)
{
  "entities": [
    {
      "type": "PERSON",
      "text": "John Smith",
      "start": 0,
      "end": 10,
      "confidence": 0.95,
      "engine": "NLP"
    },
    {
      "type": "EMAIL_ADDRESS",
      "text": "john.smith@example.com",
      "start": 23,
      "end": 46,
      "confidence": 1.0,
      "engine": "Pattern"
    }
  ]
}
                    

Anonymization

Seven anonymization methods

Choose the method that matches your risk model and data utility requirements. All seven methods are available across API, Desktop App, and Office Add-in.

Replace

Substitute with synthetic data that maintains format and realism

                    John Doe → Max Schmidt
                

Redact

Remove completely with placeholder indicating entity type

                    John Doe → [NAME]
                

Mask

Partially obscure while preserving format recognition

                    john@mail.com → j***@***.com
                

Hash

One-way SHA-256 hash for consistent pseudonymization

                    John → a8f5f1...
                

Encrypt

AES-256-GCM symmetric encryption with controlled decryption

                    Reversible with key
                

Asymmetric Encrypt

RSA-4096 public-key encryption for zero-trust data sharing

                    RSA-4096 / public key
                

Keep

Preserve original value — detect only, no transformation applied

                    Detect & flag only
                

Add-on

Image Anonymization

OCR-powered text detection in images. Extract, detect, and redact PII directly from image files.

Supported Formats

JPEG/JPG

PNG

TIFF

BMP

WebP

GIF

How It Works

Tesseract OCR — Industry-standard optical character recognition
48 Languages — Same language support as text processing
Presidio NLP — Microsoft Presidio analyzes extracted text
Pixel Mapping — PII locations mapped to exact coordinates
Solid Redaction — Covered with configurable color rectangles

Detection Capabilities

25+ entity types detected via OCR-extracted text:

Names Emails Phones Credit Cards IBANs SSNs Addresses Dates IPs Passports

Important Limitation

Image anonymization detects text that OCR can read. It does not detect faces, license plates, QR codes, or handwriting. For best results, use high-resolution images with clear, printed text.

Processing time: 3–20 seconds depending on image size and complexity.

OPTIONAL ADD-ON FOR ALL PACKAGES

Reversibility

Decryption & Re-identification

Need the original data back? The Encrypt method enables controlled, key-based decryption for authorized re-identification.

How Reversible Encryption Works

1. ENCRYPT

→

PII encrypted with AES-256-GCM using your personal key

2. STORE

→

Encrypted tokens replace original values in documents

3. DECRYPT

→

Authorized users with the key can restore original values

Comparison: Irreversible vs Reversible

Method	Reversible	Use Case
Redact	✗	Permanent removal
Replace	✗	Synthetic substitution
Mask	✗	Partial visibility
Hash (SHA-256)	✗	Consistent pseudonymization
Encrypt (AES-256-GCM)	✓	Re-identification required

When to Use Encryption

Data Processing Agreements — When processors need to return original data
Research & Analytics — Anonymize for analysis, re-identify for follow-up
Temporary Redaction — Share documents safely, restore when authorized
Legal Discovery — Protect PII during review, reveal when legally required
Cross-Border Transfer — Encrypt for transit, decrypt at destination

ENCRYPTION DETAILS

Algorithm: AES-256-GCM (authenticated encryption)
Key Derivation: PBKDF2 with personal master key
Key Storage: Zero-knowledge — only you hold the key
Token Format: Base64-encoded ciphertext with auth tag
Decryption: API endpoint or Desktop App with key

Note: Without the encryption key, data cannot be recovered. We recommend secure key backup using the 24-word recovery phrase.

Languages

48 languages with NLP processing

Comprehensive language support including right-to-left scripts and mixed-language document processing.

Language Features

Automatic Detection — Language is detected automatically, no configuration needed
RTL Support — Full support for Arabic, Hebrew, Persian, and Urdu
Mixed Documents — Handle documents containing multiple languages
Lazy-Loaded Models — Language models load on-demand for efficiency
NLP-Powered — Natural language processing for context-aware detection

Supported Languages

English German French Spanish Italian Portuguese Dutch Polish Russian Chinese Japanese Korean Arabic (RTL) Hebrew (RTL) + 34 more →

View complete language list

Integrations

Connect to your workflows

Multiple integration points to embed anonymization into your existing processes.

REST API

RESTful endpoints with JWT authentication and rate limiting for automation and pipelines.

MCP Server

Native Claude Desktop integration plus HTTP for Cursor & VS Code. 6 operators, entity groups, personal encryption keys.

Office Add-in

Microsoft Word integration with real-time detection, one-click anonymization, and format preservation.

Desktop App

Local file processing for Windows, macOS, Linux. Military-grade vault encryption and offline history.

Chrome Extension

Protect privacy on ChatGPT, Claude, and Gemini with real-time PII interception and response de-anonymization.

Batch Processing

Multi-document upload with parallel processing, progress tracking, and bulk downloads (up to 100 docs).

View Integration Details

Compliance

Ready-to-use compliance presets

Pre-configured presets for common regulatory requirements. Customize or create your own.

GDPR Preset

Configured for EU General Data Protection Regulation compliance with focus on personal data categories defined in Art. 4.

Personal identifiers
Contact information
Location data
Online identifiers

HIPAA Preset

Configured for US Health Insurance Portability and Accountability Act with 18 PHI identifiers.

Patient identifiers
Medical record numbers
Health plan IDs
Biometric identifiers

PCI-DSS Preset

Configured for Payment Card Industry Data Security Standard with focus on cardholder data.

Primary account numbers
Cardholder names
Expiration dates
Service codes

Core Security Principle

Zero-Knowledge: We never see your data

Your text is processed and returned — never stored, logged, or accessible to us. Even our own team cannot see what you're processing. This is the hammer.

Built with security-first principles and modern cryptographic standards:

No text storage — Your content passes through, never persists
No logging of content — We log events, never data
Argon2id + XChaCha20-Poly1305 — State-of-the-art encryption
24-Word Recovery Phrase — BIP39-style recovery you control
Personal Encryption Keys — Your keys, your control

View Security Details

Time to Value

From signup to production in hours, not months

Unlike solutions that require weeks of engineering and steep learning curves, our managed service gets you anonymizing data the same day. Upload files, click Anonymize, download results — no technical expertise needed.

FASTEST

SaaS (Online Hosted)

Hours

MANAGED

Managed Private

Days

Dedicated instance provisioned by our team. Your data, your environment, our management.

SELF-HOSTED

Self-Managed

1–2 Weeks

Deploy on your infrastructure. Full control, full isolation, our software and support.

Self-Hosted Option Available

For organizations requiring complete data sovereignty, our self-managed package lets you run the entire anonymization stack on your own infrastructure — on-premises or in your private cloud. Same features, same updates, zero external data transfer.

Learn About Self-Managed

File Formats

Process any document type

Our detection and anonymization works across all common document formats. Upload via API, Desktop App, or batch processing.

PDF DOCX XLSX TXT CSV JSON PPTX HTML XML Images (OCR)

Audit & Logging

Complete audit trail

Every anonymization operation is logged for compliance and governance requirements — without ever storing your actual data.

Processing logs — What was processed, when, by whom
Entity counts — Number and types of entities detected
Method applied — Which anonymization method was used per entity
Zero content logging — We log events, never your data
Export capability — Download logs for external audit systems

Live Demos

Try These Features on Our Platforms

Each platform showcases different aspects of our technology. Choose the one that matches your use case.

Enterprise API Access

Full REST API with 320+ entity types, batch processing, and comprehensive documentation.

Try anonymize.today ↗

Image & OCR Anonymization

Tesseract OCR + Presidio NLP for processing scanned documents and photos with embedded PII.

Try blurgate.legal ↗

AI Chat Protection

Chrome Extension for real-time PII masking before sharing with ChatGPT, Claude, or Gemini.

Try anonymize.live ↗

48 Languages + RTL Support

Multilingual detection including Arabic, Hebrew, Persian, and Urdu with right-to-left rendering.

Try anonymize.world ↗

Education Sector (FERPA + GDPR)

Student ID detection, LibreOffice Add-in, and education-specific compliance features.

Try anonymize.education ↗

Legal Documentation Portal

GDPR compliance documentation, API reference, and law firm-focused workflows.

Try anonym.legal ↗

See All 11 Platforms →

Ready to explore the full platform?

See how anonymize.solutions can protect your data with comprehensive detection and anonymization capabilities.

Request Demo Compare Packages