Official Definition

“PII (Personally Identifiable Information) refers to any information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information that is linked or linkable to a specific individual.”

— NIST SP 800-122

Types of PII

Not all PII carries equal risk. Understanding the three categories helps determine the right level of protection for each data type.

Direct Identifiers

Data that can identify a person on its own, without needing any additional context or cross-referencing.

  • Full name
  • Social Security Number (SSN)
  • Passport number
  • Driver’s license number
  • Biometric data (fingerprints, face scans)

Quasi-Identifiers

Data that cannot identify a person alone but can do so when combined with other quasi-identifiers or external datasets.

  • Date of birth
  • ZIP code / postal code
  • Gender
  • Job title
  • Nationality

Sensitive PII

Data that requires heightened protection due to the potential for discrimination, harm, or significant privacy impact if exposed.

  • Medical records
  • Financial data (credit cards, bank accounts)
  • Racial or ethnic origin
  • Political opinions
  • Sexual orientation

PII Categories at a Glance

A comprehensive breakdown of PII categories, common examples, associated risk levels, and the primary regulations that govern each type.

Category Examples Risk Level Key Regulation
Identity Name, SSN, passport number High All
Contact Email, phone, physical address Medium GDPR, CCPA
Financial Credit card, IBAN, tax ID Critical PCI-DSS, GDPR
Health Medical records, prescriptions Critical HIPAA, GDPR
Digital IP address, device ID, cookies Medium GDPR, ePrivacy
Biometric Fingerprints, face scans, voice prints Critical GDPR Art. 9
Employment Employee ID, salary, performance reviews Medium GDPR
Education Student records, grades, transcripts Medium FERPA

How Laws Define PII

There is no single global definition of PII. The two major frameworks — GDPR and US law — take fundamentally different approaches.

GDPR — “Personal Data” (Art. 4(1))

The EU’s General Data Protection Regulation uses the broadest definition in global privacy law.

  • Any data relating to an identified or identifiable natural person
  • Includes online identifiers (IP addresses, cookie IDs, device fingerprints)
  • Pseudonymized data is still personal data
  • Only fully anonymized data falls outside GDPR scope
  • Special categories (Art. 9): health, biometric, racial origin, political opinions

US Law — Sector-Specific Definitions

The United States has no single federal privacy law. PII definitions vary by sector, state, and regulation.

  • No unified federal definition of PII
  • CCPA (California): broad — includes IP addresses, browsing history
  • HIPAA (Health): Protected Health Information (PHI), 18 identifier types
  • FERPA (Education): student education records
  • Generally narrower than GDPR — often requires data to directly identify a person

Where PII Hides

PII is rarely confined to a single database column. It spreads across systems, documents, and workflows — often in places organizations never think to check.

Emails & Messages

Names, phone numbers, addresses, and account details embedded in email threads, Slack messages, and support tickets. Often forwarded and copied without redaction.

Documents & Spreadsheets

Contracts, invoices, HR files, and Excel exports containing SSNs, salaries, medical information, and customer records shared across departments.

AI Prompts & Chat Logs

Users paste customer data, code snippets with credentials, and personal details into ChatGPT, Claude, and Copilot — all logged by the AI provider.

Database Fields

Free-text columns, notes fields, and unstructured data in CRM, ERP, and ticketing systems where PII is entered without validation or tagging.

Log Files & Analytics

IP addresses, user agents, session IDs, and sometimes full request bodies with PII captured in application logs, web server logs, and analytics platforms.

Images & Scanned Documents

Scanned IDs, passport photos, medical forms, and screenshots containing visible PII that survives text-based redaction because it exists as pixels, not characters.

How to Protect PII

Effective PII protection requires automated detection at scale, multiple anonymization methods, and architecture that eliminates single points of failure.

  • Automated PII Detection — 260+ entity types across 48 languages, powered by NLP and Pattern engines with checksum validation. No manual tagging required.
  • Multiple Anonymization Methods — Replace with realistic fakes, redact entirely, mask partially, hash for consistency, or encrypt for reversible protection. Choose per entity type.
  • AI Workflow Protection — MCP Server for IDE integration, Chrome Extension for AI chat platforms, REST API for pipelines. Anonymize before data reaches any LLM.
  • Compliance Presets — Pre-configured detection profiles for regulatory requirements (e.g., GDPR). Select a preset and the system automatically targets the relevant entity types.
  • Zero-Knowledge Architecture — Your data passes through, gets anonymized, and returns. We never store, log, or access your original content. Even our team cannot see your data.

Deterministic, Not AI-Based

We use NLP + Pattern engines — not LLMs — to detect PII. Same input always produces the same output. No hallucinations, no variation between runs, full audit trail for every detection.

100% EU Infrastructure

All processing happens on Hetzner infrastructure in Germany. No data leaves the EU. No US Cloud Act exposure. No third-party sub-processors with access to your content.

Frequently Asked Questions

Common questions about PII, privacy regulations, and how automated anonymization works.

The GDPR’s definition of “personal data” is broader than the US concept of “PII.” Under GDPR, personal data includes any information relating to an identified or identifiable natural person — including online identifiers like IP addresses and cookie IDs. US “PII” definitions vary by sector and state law, and are generally narrower, often requiring the data to directly identify an individual. In practice, if you comply with GDPR’s broader definition, you are also covering US PII requirements.

Yes, always. An email address is a direct identifier because it can uniquely identify a specific individual without needing additional context. This is true under both GDPR (personal data) and US privacy frameworks (PII). Even a generic work email like info@company.com may qualify if it can be linked to a specific person. A personal email like firstname.lastname@provider.com is unambiguously PII in every jurisdiction.

Under the GDPR, yes — IP addresses are explicitly classified as personal data because they can be used to identify an individual, especially when combined with data held by an ISP. The CJEU confirmed this in the Breyer v. Bundesrepublik case (2016). In the US, it depends on context: the CCPA considers IP addresses personal information, while other federal frameworks may not classify them as PII unless they can be directly linked to a specific person. Best practice: treat IP addresses as PII regardless of jurisdiction.

The consequences are severe and multi-layered. Regulatory fines: GDPR allows penalties up to 4% of annual global revenue or €20 million, whichever is higher. Legal liability: class-action lawsuits, individual compensation claims, and mandatory breach notifications within 72 hours (GDPR) or varying state deadlines (US). Reputational damage: loss of customer trust, negative press coverage, and long-term brand impact. Operational costs: forensic investigation, remediation, credit monitoring for affected individuals, and increased insurance premiums.

Over 260 entity types across 48 languages. The hybrid detection engine combines NLP-based named entity recognition (25 spaCy languages + 7 Stanza + 16 XLM-RoBERTa) with 317 regex recognizers that include checksum validation (Luhn for credit cards, MOD-97 for IBANs, format rules for SSNs). Entity types span identity, contact, financial, health, digital, biometric, employment, and education categories — covering the full spectrum of PII as defined by major regulations including GDPR.

Pre-processing anonymization: detect and remove PII before data reaches the LLM. This eliminates the risk at the source rather than relying on the AI provider’s data handling policies. Use a deterministic detection engine (not another LLM) to avoid circular dependencies where you send sensitive data to one AI to protect it from another. Ensure the anonymization layer supports reversible tokens so authorized users can re-identify data in AI responses when needed. For implementation, use an MCP Server for IDE integration, a Chrome Extension for browser-based AI chats, or a REST API for automated pipelines.

Start protecting PII automatically

260+ entity types, 48 languages, Zero-Knowledge architecture. Detect and anonymize PII before it becomes a compliance risk.