What is HIPAA Safe Harbor De-identification?

Under 45 CFR § 164.514(b), covered entities and their business associates may de-identify protected health information using one of two methods:

  1. Safe Harbor Method (§ 164.514(b)(2)): Remove all 18 specified identifiers and verify that the covered entity has no actual knowledge that the remaining information could be used to identify an individual.
  2. Expert Determination Method (§ 164.514(b)(1)): A qualified statistical expert certifies that the risk of identification is "very small."

Safe Harbor is the more commonly used method because it provides a clear, bright-line rule. If all 18 identifiers are removed, the data is de-identified by definition — no statistical expertise required. This guide focuses on Safe Harbor.

Key benefit: De-identified health information is not PHI. It falls outside the scope of HIPAA's Privacy Rule entirely. You can share it freely for research, publish it, use it to train AI models, and process it with third-party tools — no BAA required.

The 18 PHI Identifiers Under 45 CFR § 164.514(b)(2)

Every one of the following identifiers must be removed or transformed to achieve Safe Harbor de-identification:

# Identifier Examples How anonymize.solutions Handles It
1NamesPatient name, next of kinNLP entity detection → Replace with [PERSON_N]
2Geographic data smaller than stateStreet, city, county, ZIPAddress detection → Replace; ZIP: first 3 digits (see below)
3Dates (except year)Birth date, admission date, discharge date, date of deathDate detection → Replace with [DATE_N] or year-only
4Phone numbersHome, mobile, work, faxRegex + NLP → Replace with [PHONE_N]
5Fax numbersProvider fax, facility faxPattern detection → Replace with [FAX_N]
6Email addressesPatient email, provider emailRegex → Replace with [EMAIL_N]
7Social Security numbersFull or partial SSNRegex (9-digit, hyphenated) → Replace with [SSN_N]
8Medical record numbersMRN, patient IDContext-aware NLP → Replace with [MRN_N]
9Health plan beneficiary numbersInsurance ID, member IDPattern + context → Replace with [INSURANCE_ID_N]
10Account numbersBank account, billing accountPattern detection → Replace with [ACCOUNT_N]
11Certificate/license numbersNPI, DEA, medical licensePattern + NLP → Replace with [LICENSE_N]
12Vehicle identifiersVIN, license platePattern detection → Replace with [VEHICLE_ID_N]
13Device identifiersSerial numbers, MAC addressesPattern + NLP → Replace with [DEVICE_ID_N]
14Web URLsPersonal profile URLs, patient portalsURL detection → Replace with [URL_N]
15IP addressesIPv4, IPv6Regex → Replace with [IP_N]
16Biometric identifiersFingerprints, voiceprints, retina scansMetadata detection → Flag for manual review
17Full-face photosPatient photos, intake photosMetadata detection → Flag; image processing via separate API
18Any other unique identifying numberAccount codes, study IDs unique to individualCustom entity patterns configurable per deployment

Safe Harbor vs Expert Determination: When to Use Which

Factor Safe Harbor Expert Determination
Expertise required None — rule-based Qualified statistician required
Data utility Lower — all 18 identifiers removed Higher — only re-identification risk reduced
Cost Low — automated implementation High — expert fees + documentation
Audit defensibility Very high — bright-line rule High — but depends on expert methodology
Best for Routine data sharing, AI training, research Complex datasets where full removal destroys utility

The "No Actual Knowledge" Requirement

Even after removing all 18 identifiers, § 164.514(b)(2)(ii) requires that the covered entity "does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual."

This means that small-cell data poses a risk. If your de-identified dataset contains only one patient with a particular rare condition in a particular age range in a particular geographic area, the combination of remaining data points may still be identifying — even with all 18 identifiers removed.

Practical guidance:

  • Apply cell suppression for demographic combinations with fewer than 5 individuals
  • Generalize age to 5-year bands rather than exact year when the dataset is small
  • Document your analysis showing no actual knowledge of residual identification risk
  • For sensitive research, consider Expert Determination despite the higher cost

Geographic Data: The ZIP Code Rule

The geographic data rule under § 164.514(b)(2)(i)(B) is nuanced and frequently misunderstood. The regulation requires removal of all geographic subdivisions smaller than a state — except that the first three digits of a ZIP code may be retained if the geographic unit formed by all ZIP codes with the same three initial digits contains more than 20,000 people.

In practice:

  • All ZIP codes whose first three digits represent fewer than 20,000 people must be replaced with "000"
  • The Census Bureau publishes population data by 3-digit ZIP prefix — this list must be maintained and updated
  • Street address, city, county, precinct, and other geographic units smaller than state must always be removed
  • State-level data may be retained

The anonymize.solutions HIPAA preset handles this automatically using a maintained list of qualifying 3-digit ZIP prefixes updated with each Census Bureau release.

Date Restrictions: What You Can and Cannot Keep

Dates are among the most commonly mishandled PHI identifiers. The rule:

  • Must remove: All elements of dates (except year) directly related to the individual — including birth date, admission date, discharge date, date of death, and exact ages over 89
  • May keep: Year only (e.g., "2024" instead of "March 15, 2024")
  • Special rule for age 90+: All ages 90 and above must be aggregated into a single category ("90 or older") — even the year of birth must be removed for these individuals
  • Date shifting: Not permitted under Safe Harbor (it preserves temporal relationships). Consider Expert Determination if date intervals are required.

Automated Safe Harbor Implementation

The anonymize.solutions API provides a dedicated HIPAA Safe Harbor preset that automatically handles all 18 identifiers according to the regulatory specifications:

import requests

API_KEY = "your-api-key"

def hipaa_safe_harbor(text: str) -> dict:
    """De-identify text using HIPAA Safe Harbor method (45 CFR § 164.514(b)(2))."""
    response = requests.post(
        "https://api.anonymize.solutions/v1/anonymize",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "text": text,
            "preset": "hipaa",
            "method": "safe_harbor",
            "zip_rule": True,        # Apply 3-digit ZIP population rule
            "age_90_rule": True,     # Aggregate ages 90+ as "90 or older"
            "date_year_only": True   # Keep year, remove day/month
        }
    )
    result = response.json()
    return result

# Example input:
sample = """
Patient: John Michael Davis
DOB: 03/15/1941 (age 84)
SSN: 234-56-7890
MRN: MRN-78234
Address: 42 Maple Street, Boston, MA 02115
Admission: 02/14/2026
Discharge: 02/18/2026
Diagnosis: Type 2 diabetes with peripheral neuropathy
"""

result = hipaa_safe_harbor(sample)
print(result["text"])
# Patient: [PERSON_1]
# DOB: 1941 (age 84)
# SSN: [SSN_1]
# MRN: [MRN_1]
# Address: [ADDRESS_1], MA 021[SUPPRESSED - small population]
# Admission: 2026
# Discharge: 2026
# Diagnosis: Type 2 diabetes with peripheral neuropathy

Note that the diagnosis — which is clinical information not itself an identifier — is preserved. Only the 18 specified identifiers are removed. This is the key advantage of Safe Harbor over over-aggressive redaction: clinical utility is maintained.

Documentation Requirements

To demonstrate Safe Harbor compliance in an audit, maintain the following documentation:

  1. De-identification policy: Written policy specifying which method (Safe Harbor) is used, which entity types are removed, and who is responsible
  2. Technical specification: Documentation of the tools used (e.g., "anonymize.solutions API, HIPAA preset, version X.Y"), including version history
  3. Processing log: Record of each de-identification run — timestamp, document count, identifiers removed — for audit trail purposes
  4. No actual knowledge attestation: For each dataset released, a brief documented analysis confirming no residual identification risk is known
  5. Recipient agreements: While de-identified data is not PHI, document who receives it and for what purpose — good practice that supports your overall compliance posture

The anonymize.solutions Managed Private package includes automated compliance logging and a compliance export feature that generates a ready-to-file de-identification documentation report.

Related Articles

📋

HIPAA Anonymization Guide

Comprehensive HIPAA compliance guide covering both Safe Harbor and Expert Determination methods.

Read More →
📄

GDPR Data Minimization Guide

Articles 5, 25 & 32 implementation for EU organizations — DPO checklist and technical measures.

Read More →
🔒

MCP Server for HIPAA-Safe AI Workflows

Use the MCP Server to strip PHI from AI prompts before they reach Claude, GPT, or Gemini.

Read More →

Automate HIPAA Safe Harbor De-identification

All 18 PHI identifiers handled automatically. ZIP code rule, age 90+ aggregation, date year-only — configured by preset. Compliance documentation included.