HIPAA Safe Harbor: Guide to 18 PHI Identifiers

What is HIPAA Safe Harbor De-identification?

Under 45 CFR § 164.514(b), covered entities and their business associates may de-identify protected health information using one of two methods:

Safe Harbor Method (§ 164.514(b)(2)): Remove all 18 specified identifiers and verify that the covered entity has no actual knowledge that the remaining information could be used to identify an individual.
Expert Determination Method (§ 164.514(b)(1)): A qualified statistical expert certifies that the risk of identification is "very small."

Safe Harbor is the more commonly used method because it provides a clear, bright-line rule. If all 18 identifiers are removed, the data is de-identified by definition — no statistical expertise required. This guide focuses on Safe Harbor.

Key benefit: De-identified health information is not PHI. It falls outside the scope of HIPAA's Privacy Rule entirely. You can share it freely for research, publish it, use it to train AI models, and process it with third-party tools — no BAA required.

The 18 PHI Identifiers Under 45 CFR § 164.514(b)(2)

Every one of the following identifiers must be removed or transformed to achieve Safe Harbor de-identification:

#	Identifier	Examples	How anonymize.solutions Handles It
1	Names	Patient name, next of kin	NLP entity detection → Replace with [PERSON_N]
2	Geographic data smaller than state	Street, city, county, ZIP	Address detection → Replace; ZIP: first 3 digits (see below)
3	Dates (except year)	Birth date, admission date, discharge date, date of death	Date detection → Replace with [DATE_N] or year-only
4	Phone numbers	Home, mobile, work, fax	Regex + NLP → Replace with [PHONE_N]
5	Fax numbers	Provider fax, facility fax	Pattern detection → Replace with [FAX_N]
6	Email addresses	Patient email, provider email	Regex → Replace with [EMAIL_N]
7	Social Security numbers	Full or partial SSN	Regex (9-digit, hyphenated) → Replace with [SSN_N]
8	Medical record numbers	MRN, patient ID	Context-aware NLP → Replace with [MRN_N]
9	Health plan beneficiary numbers	Insurance ID, member ID	Pattern + context → Replace with [INSURANCE_ID_N]
10	Account numbers	Bank account, billing account	Pattern detection → Replace with [ACCOUNT_N]
11	Certificate/license numbers	NPI, DEA, medical license	Pattern + NLP → Replace with [LICENSE_N]
12	Vehicle identifiers	VIN, license plate	Pattern detection → Replace with [VEHICLE_ID_N]
13	Device identifiers	Serial numbers, MAC addresses	Pattern + NLP → Replace with [DEVICE_ID_N]
14	Web URLs	Personal profile URLs, patient portals	URL detection → Replace with [URL_N]
15	IP addresses	IPv4, IPv6	Regex → Replace with [IP_N]
16	Biometric identifiers	Fingerprints, voiceprints, retina scans	Metadata detection → Flag for manual review
17	Full-face photos	Patient photos, intake photos	Metadata detection → Flag; image processing via separate API
18	Any other unique identifying number	Account codes, study IDs unique to individual	Custom entity patterns configurable per deployment

Safe Harbor vs Expert Determination: When to Use Which

Factor	Safe Harbor	Expert Determination
Expertise required	None — rule-based	Qualified statistician required
Data utility	Lower — all 18 identifiers removed	Higher — only re-identification risk reduced
Cost	Low — automated implementation	High — expert fees + documentation
Audit defensibility	Very high — bright-line rule	High — but depends on expert methodology
Best for	Routine data sharing, AI training, research	Complex datasets where full removal destroys utility

The "No Actual Knowledge" Requirement

Even after removing all 18 identifiers, § 164.514(b)(2)(ii) requires that the covered entity "does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual."

This means that small-cell data poses a risk. If your de-identified dataset contains only one patient with a particular rare condition in a particular age range in a particular geographic area, the combination of remaining data points may still be identifying — even with all 18 identifiers removed.

Practical guidance:

Apply cell suppression for demographic combinations with fewer than 5 individuals
Generalize age to 5-year bands rather than exact year when the dataset is small
Document your analysis showing no actual knowledge of residual identification risk
For sensitive research, consider Expert Determination despite the higher cost

Geographic Data: The ZIP Code Rule

The geographic data rule under § 164.514(b)(2)(i)(B) is nuanced and frequently misunderstood. The regulation requires removal of all geographic subdivisions smaller than a state — except that the first three digits of a ZIP code may be retained if the geographic unit formed by all ZIP codes with the same three initial digits contains more than 20,000 people.

In practice:

All ZIP codes whose first three digits represent fewer than 20,000 people must be replaced with "000"
The Census Bureau publishes population data by 3-digit ZIP prefix — this list must be maintained and updated
Street address, city, county, precinct, and other geographic units smaller than state must always be removed
State-level data may be retained

The anonymize.solutions HIPAA preset handles this automatically using a maintained list of qualifying 3-digit ZIP prefixes updated with each Census Bureau release.

Date Restrictions: What You Can and Cannot Keep

Dates are among the most commonly mishandled PHI identifiers. The rule:

Must remove: All elements of dates (except year) directly related to the individual — including birth date, admission date, discharge date, date of death, and exact ages over 89
May keep: Year only (e.g., "2024" instead of "March 15, 2024")
Special rule for age 90+: All ages 90 and above must be aggregated into a single category ("90 or older") — even the year of birth must be removed for these individuals
Date shifting: Not permitted under Safe Harbor (it preserves temporal relationships). Consider Expert Determination if date intervals are required.

Automated Safe Harbor Implementation

The anonymize.solutions API provides a dedicated HIPAA Safe Harbor preset that automatically handles all 18 identifiers according to the regulatory specifications:

import requests

API_KEY = "your-api-key"

def hipaa_safe_harbor(text: str) -> dict:
    """De-identify text using HIPAA Safe Harbor method (45 CFR § 164.514(b)(2))."""
    response = requests.post(
        "https://api.anonymize.solutions/v1/anonymize",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "text": text,
            "preset": "hipaa",
            "method": "safe_harbor",
            "zip_rule": True,        # Apply 3-digit ZIP population rule
            "age_90_rule": True,     # Aggregate ages 90+ as "90 or older"
            "date_year_only": True   # Keep year, remove day/month
        }
    )
    result = response.json()
    return result

# Example input:
sample = """
Patient: John Michael Davis
DOB: 03/15/1941 (age 84)
SSN: 234-56-7890
MRN: MRN-78234
Address: 42 Maple Street, Boston, MA 02115
Admission: 02/14/2026
Discharge: 02/18/2026
Diagnosis: Type 2 diabetes with peripheral neuropathy
"""

result = hipaa_safe_harbor(sample)
print(result["text"])
# Patient: [PERSON_1]
# DOB: 1941 (age 84)
# SSN: [SSN_1]
# MRN: [MRN_1]
# Address: [ADDRESS_1], MA 021[SUPPRESSED - small population]
# Admission: 2026
# Discharge: 2026
# Diagnosis: Type 2 diabetes with peripheral neuropathy

Note that the diagnosis — which is clinical information not itself an identifier — is preserved. Only the 18 specified identifiers are removed. This is the key advantage of Safe Harbor over over-aggressive redaction: clinical utility is maintained.

Documentation Requirements

To demonstrate Safe Harbor compliance in an audit, maintain the following documentation:

De-identification policy: Written policy specifying which method (Safe Harbor) is used, which entity types are removed, and who is responsible
Technical specification: Documentation of the tools used (e.g., "anonymize.solutions API, HIPAA preset, version X.Y"), including version history
Processing log: Record of each de-identification run — timestamp, document count, identifiers removed — for audit trail purposes
No actual knowledge attestation: For each dataset released, a brief documented analysis confirming no residual identification risk is known
Recipient agreements: While de-identified data is not PHI, document who receives it and for what purpose — good practice that supports your overall compliance posture

The anonymize.solutions Managed Private package includes automated compliance logging and a compliance export feature that generates a ready-to-file de-identification documentation report.

HIPAA Safe Harbor De-identification: Complete Guide to 18 PHI Identifiers

What is HIPAA Safe Harbor De-identification?

The 18 PHI Identifiers Under 45 CFR § 164.514(b)(2)

Safe Harbor vs Expert Determination: When to Use Which

The "No Actual Knowledge" Requirement

Geographic Data: The ZIP Code Rule

Date Restrictions: What You Can and Cannot Keep

Automated Safe Harbor Implementation

Documentation Requirements

Related Articles

HIPAA Anonymization Guide

GDPR Data Minimization Guide

MCP Server for HIPAA-Safe AI Workflows

Automate HIPAA Safe Harbor De-identification