Three Approaches to Data Protection

Each technique serves a different purpose. The right choice depends on whether you need to share data, process it internally, or protect it in storage.

Anonymization

Irreversible removal of identifying information. Once anonymized, data cannot be linked back to any individual. The original values are permanently destroyed.

  • GDPR Status: Outside scope — not personal data
  • Reversible: No — permanent transformation
  • Best for: Analytics, research, data sharing

Pseudonymization

Reversible substitution of identifiers with tokens or pseudonyms. A separate key or mapping table allows re-identification when authorized.

  • GDPR Status: Personal data (Art. 4(5))
  • Reversible: Yes — with key or mapping
  • Best for: Internal processing, backups, testing

Encryption

Mathematical transformation of data into ciphertext using a cryptographic key. Decryption restores the original data exactly as it was.

  • GDPR Status: Personal data — reversible with key
  • Reversible: Yes — with decryption key
  • Best for: Data at rest, data in transit, access control

Side-by-Side Analysis

Ten dimensions that matter when choosing a data protection technique. Each method excels in different areas.

Comparison of anonymization, pseudonymization, and encryption across ten key dimensions
Dimension Anonymization Pseudonymization Encryption
Definition Irreversible removal or transformation of personal identifiers Reversible substitution of identifiers with tokens or pseudonyms Mathematical transformation of data into ciphertext using a key
Reversibility No Yes — with key Yes — with key
GDPR Status Outside scope Personal data (Art. 4(5)) Personal data
Data Utility Reduced Preserved Preserved
Processing Speed Fast Fast Medium
Key Required No Yes Yes
Re-identification Risk Eliminated Managed Managed
Typical Methods Replace, redact, generalize Tokenize, hash, mask AES-256 (symmetric encryption), RSA (asymmetric encryption), TLS (transport security)
Use Cases Sharing, analytics, ML training Testing, development, backup Storage, transmission, access
Implementation Complexity Medium Medium High

Note: GDPR Recital 26 clarifies that data rendered truly anonymous (not reasonably linkable to a natural person) falls outside the regulation’s scope. Pseudonymization is explicitly defined in Article 4(5) as a processing technique, not an exemption.

Anonymization — Irreversible by Design

What It Is

Anonymization permanently removes or transforms personal identifiers so that the data subject can no longer be identified, directly or indirectly, by any means reasonably likely to be used. The original data is destroyed — there is no key, no mapping table, no way back.

This irreversibility is precisely what gives anonymization its legal advantage: truly anonymized data falls outside the scope of GDPR entirely, meaning it can be shared, published, and processed without the constraints that apply to personal data.

How It Works

The detection engine identifies all PII entities in the text, then applies a destructive transformation. The method depends on the entity type and the desired output format:

  • Replace — Substitute with a generic label: “John Smith” becomes “PERSON_1”
  • Redact — Remove entirely: “4532-8821-1234-5678” becomes “[REDACTED]”
  • Generalize — Reduce precision: “15 March 1987” becomes “1987”

Real-World Examples

BEFORE

The patient John Smith (DOB: 15/03/1987) was treated at Berlin Charité. Contact: john.smith@email.de

AFTER (REPLACE)

The patient PERSON_1 (DOB: 1987) was treated at ORGANIZATION_1. Contact: [REDACTED]

AFTER (REDACT)

The patient **** (DOB: ****) was treated at ****. Contact: ****

Our 5 Methods

anonymize.solutions supports Replace, Redact, Mask, Hash, and Encrypt. Replace and Redact are irreversible (anonymization). Mask, Hash, and Encrypt preserve reversibility (pseudonymization/encryption).

Pseudonymization — Reversible with Authorization

What It Is

Pseudonymization replaces identifying information with artificial identifiers (pseudonyms or tokens). A separate, securely stored key or mapping table allows authorized parties to reverse the transformation and recover the original data.

GDPR Article 4(5): “Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately.”

How Tokenization Works

Each detected entity is replaced with a unique token. A secure mapping table records the relationship between the token and the original value. When re-identification is needed, the mapping table is consulted to restore the original data.

John Smith
TOKEN_A3F9
john@email.de
TOKEN_B7E2
DE89 3704 0044
TOKEN_C1D5

When Organizations Need It

Pseudonymization is the right choice when you need to process personal data but want to limit exposure and comply with GDPR’s data minimization principle.

  • Testing with Realistic Data — Development teams need data that behaves like production data but doesn’t expose real identities. Pseudonymized test datasets preserve format, length, and relationships.
  • Internal Analytics — Business intelligence teams can analyze patterns across pseudonymized datasets without accessing real personal data, reducing risk while maintaining analytical value.
  • Cross-Department Sharing — Share datasets between departments (e.g., marketing receives customer behavior data from support) without exposing personal identifiers to unauthorized teams.
  • Backup & Disaster Recovery — Pseudonymize backups so that a breach of backup storage doesn’t expose real personal data. Restore with the key when needed.
  • Research & Clinical Trials — Medical researchers can analyze patient data without knowing patient identities, with re-identification possible for follow-up when clinically necessary.

GDPR Benefit

While pseudonymized data is still personal data, GDPR explicitly recognizes pseudonymization as a security measure (Article 32) and provides specific provisions that make it easier to process pseudonymized data for research and statistical purposes (Article 89).

Encryption — Mathematical Access Control

What It Is

Encryption transforms data into unintelligible ciphertext using a mathematical algorithm and a cryptographic key. Only someone with the correct decryption key can restore the original data. Without the key, the encrypted data is computationally infeasible to read.

Symmetric vs Asymmetric

Symmetric (AES-256)

Same key encrypts and decrypts. Fast, efficient, ideal for data at rest and bulk processing. This is what anonymize.solutions uses for its Zero-Knowledge vault.

Asymmetric (RSA)

Public key encrypts, private key decrypts. Slower, but enables secure key exchange. Used for TLS/HTTPS and digital signatures.

When to Use Encryption

  • Data at Rest — Protect stored data (databases, file systems, backups) against unauthorized access
  • Data in Transit — Secure data moving between systems (TLS/HTTPS, VPN, encrypted API calls)
  • Access Control — Ensure only key holders can read specific data fields or documents

AES-256-GCM — What We Use

anonymize.solutions uses AES-256-GCM (Galois/Counter Mode) for its Zero-Knowledge vault and the Encrypt protection method. This provides both confidentiality and integrity verification in a single operation.

AES-256-GCM PROPERTIES

  • Key Size: 256-bit (2256 possible keys)
  • Mode: Authenticated encryption (GCM)
  • Integrity: Built-in authentication tag
  • Standard: NIST SP 800-38D
  • Key Derivation: Argon2id (memory-hard password hashing)

Zero-Knowledge Architecture

The encryption key is derived from your password using Argon2id on your device. The password never leaves your machine. We store only the encrypted output — even our team cannot decrypt your data. This is true Zero-Knowledge encryption.

Which Method Should I Use?

Start with your use case. The right method depends on what you need to do with the data after protection.

“I need to share data externally”

When data leaves your control — shared with partners, published as open data, or used for third-party analytics — anonymization is the only safe option.

Use Anonymization
Irreversible. Data exits your control safely. No GDPR obligations on the recipient.

“I need to process data internally”

When data stays within your organization — testing, development, internal analytics, cross-department sharing — pseudonymization maintains utility while reducing risk.

Use Pseudonymization
Reversible. Maintain data utility. Re-identify when authorized.

“I need to store or transmit data securely”

When data needs to be protected against unauthorized access — at rest in databases, in transit over networks, or in backup storage — encryption is the standard.

Use Encryption
Key-based access control. Full data fidelity. Industry standard.

How anonymize.solutions Supports All Three

One platform, five protection methods, all three techniques. Choose the right method for each entity type, or let compliance presets decide automatically.

  • Replace & Redact (Anonymization) — Irreversible transformation. Names become PERSON_1, credit cards become [REDACTED]. No key management required. Ideal for external sharing, analytics, and AI training data.
  • Tokenize & Mask (Pseudonymization) — Reversible substitution with format-preserving tokens. Maintains data structure and relationships. Secure mapping table enables authorized re-identification.
  • Encrypt with AES-256-GCM (Encryption) — Military-grade encryption with authenticated integrity checking. Password-derived keys via Argon2id. Used in our Zero-Knowledge vault for local storage protection.
  • Compliance Presets Auto-Select the Right Method — Compliance presets (e.g., GDPR) automatically assign the appropriate method based on entity type and regulatory requirements. No manual configuration needed.
  • Zero-Knowledge Vault Uses Encryption for Local Storage — Your processed documents are encrypted locally with AES-256-GCM before any cloud sync. The encryption key never leaves your device. Even our servers cannot read your stored data.

Per-Entity Method Selection

Different entity types within the same document can use different methods. Anonymize names (Replace), pseudonymize email addresses (Mask), and encrypt medical record numbers (Encrypt) — all in a single processing pass.

260+ Entity Types, 48 Languages

Our Hybrid detection engine (NLP + Pattern) identifies 260+ entity types across 48 languages with High detection accuracy. Every entity detected can be assigned any of the five protection methods independently.

Frequently Asked Questions

Is pseudonymized data still personal data under GDPR?

Yes. GDPR Article 4(5) explicitly defines pseudonymization as processing personal data such that it can no longer be attributed to a specific data subject without additional information. Because re-identification is possible with the key or mapping table, pseudonymized data remains personal data and is fully subject to GDPR obligations — including lawful basis, data subject rights, and breach notification. However, GDPR does recognize pseudonymization as a valuable security measure and provides specific accommodations for pseudonymized data in research contexts (Article 89).

Can anonymization be reversed?

No. By definition, anonymization is irreversible. Once data is truly anonymized, there is no key, mapping table, or algorithm that can recover the original values. This is what distinguishes anonymization from pseudonymization and encryption, and why anonymized data falls outside the scope of GDPR entirely. If a method claims to be “anonymization” but can be reversed, it is pseudonymization, not anonymization.

Which method is better for AI training data?

Anonymization is the recommended method for AI training data. Training datasets are shared broadly, ingested into model weights, and cannot be reliably deleted after training. Irreversible anonymization eliminates the risk of personal data leaking through model outputs. Pseudonymization is insufficient because the reversible tokens could theoretically be re-identified. Encryption is impractical because encrypted data cannot be used for training without first decrypting it.

Does encryption count as anonymization under GDPR?

No. Encryption is reversible with the correct key, which means the encrypted data can be linked back to an identifiable person. Under GDPR, encrypted data is still considered personal data. However, encryption is explicitly recognized as an appropriate technical measure under Article 32 (security of processing) and can significantly reduce the impact of data breaches — in some cases, encrypted data breaches may not require notification under Article 34.

What is the fastest anonymization method?

Replace and Redact are the fastest methods because they perform simple string substitution or removal without any cryptographic operations or key management overhead. They are ideal for high-throughput pipelines where speed matters and reversibility is not required. Hash adds minimal overhead (a single hash computation per entity). Encrypt is the slowest due to key derivation and authenticated encryption operations, though still sub-second for typical documents.

Can I combine multiple protection methods on the same document?

Yes. anonymize.solutions supports per-entity method selection, meaning you can anonymize names (Replace), redact credit card numbers (Redact), pseudonymize email addresses (Mask), and encrypt medical record numbers (Encrypt) — all within the same document and in a single processing pass. Compliance presets (e.g., GDPR) automatically assign the appropriate method based on entity type and regulatory requirements.

Choose the right protection for every use case

Anonymize, pseudonymize, or encrypt — one platform supports all three. See it in action with your own data.