Protect PII in LLM Prompts: Zero-Knowledge Guide

Why Every Prompt is a Potential Data Breach

When you send a prompt to ChatGPT, Claude, Gemini, or any commercial LLM API, that data is processed on servers you do not control. OpenAI, Anthropic, and Google all log prompts for abuse monitoring, safety evaluation, and — unless you opt out — model training. Your enterprise API key does not make your prompts private. A Business Associate Agreement (BAA) shifts legal liability, but it does not prevent the data from flowing to the provider’s infrastructure.

The practical risk is significant. A Cyberhaven study found that 11% of data pasted into ChatGPT by enterprise users includes sensitive information — customer PII, financial data, source code, and internal documents. For a healthcare organization, a legal firm, or any company handling regulated data, this is not a theoretical risk. It is a daily occurrence.

The 3 Ways PII Leaks Through AI Prompts

1. Training Data Contamination

If your LLM provider uses conversation data for training (and many do by default unless you opt out), real PII you included in prompts can end up embedded in the model’s weights. Cases have been documented where models reproduce fragments of real personal data that appeared in their training corpus. Once in a model’s weights, this data cannot be removed without full retraining.

2. Prompt Logging by AI Providers

All major LLM providers log prompts for abuse monitoring, safety research, and service improvement. These logs are retained for varying periods. Even with data processing agreements in place, your prompts are visible to provider employees with access to monitoring infrastructure. For HIPAA-covered entities, the mere transmission of PHI to an external system without adequate safeguards is a HIPAA violation, regardless of whether the data is subsequently misused.

3. Output Exposure in Conversation History

LLM conversations are frequently shared. A developer pastes an LLM response into a Slack message. A manager screenshots a conversation for a report. A support agent copies an AI-generated response into a ticket system. If the original prompt contained PII, the response may echo or reference it, and that response now propagates through your organization’s communication systems — often with weaker security controls than your primary data stores.

Solution 1 — MCP Server (For Developers)

The Model Context Protocol (MCP) is an open standard for connecting AI assistants to tools and data sources. anonymize.solutions provides an MCP server that intercepts all data flowing between your code and any LLM, anonymizing PII before transmission and optionally de-anonymizing responses for the end user.

To add the anonymize.solutions MCP server to Claude Desktop, Cursor, or any MCP-compatible AI assistant:

// claude_desktop_config.json
{
  "mcpServers": {
    "anonymize": {
      "command": "npx",
      "args": ["-y", "@anonymize-solutions/mcp-server"],
      "env": {
        "ANONYMIZE_API_KEY": "your-api-key-here",
        "ANONYMIZE_MODE": "replace",
        "ANONYMIZE_LANGUAGES": "en,de,fr,nl"
      }
    }
  }
}

Once configured, the MCP server exposes six operators to the AI assistant:

anonymize — Replace PII with consistent pseudonyms or type placeholders
detect — Identify PII entities without replacing them
analyze — Return entity counts, confidence scores, and entity type breakdown
de-anonymize — Reverse a previous anonymization using the session key
encrypt — AES-256-GCM encrypt named entities for reversible protection
decrypt — Reverse encryption using the original key

The MCP server runs locally and communicates with the anonymize.solutions API using your API key. The AI assistant never sees the real PII — it operates on anonymized text throughout the session.

Solution 2 — Chrome Extension (For Non-Technical Users)

For users who work directly in the ChatGPT, Claude, or Gemini web interface, the anonymize.solutions Chrome Extension provides transparent PII protection without requiring any API integration.

When a user types or pastes text into the AI chat interface, the extension:

Detects PII entities in real-time using the same 317-recognizer engine as the API
Replaces PII with consistent pseudonyms or encrypted tokens before the text is submitted
Optionally de-anonymizes the AI’s response before displaying it to the user

The key differentiator is reversible encryption. Unlike simple replacement (“John Smith” → “PERSON_1”), AES-256-GCM encryption preserves the ability to recover the original value. If the AI produces a response that references “PERSON_1” and you need to share it with a colleague who knows the real person, the extension can silently decrypt the response for authorized users while maintaining the encrypted form in transit and storage.

This is particularly important for legal and healthcare workflows where the AI output must be actionable. A legal assistant using AI to draft a settlement letter cannot use a letter that says “Dear PERSON_1” — but they also cannot send the AI a letter containing the real client’s personal details. Reversible encryption solves both problems simultaneously.

Solution 3 — REST API (For Application Builders)

If you are building an application that calls an LLM on behalf of users — a customer service bot, a document summarization tool, a code review assistant — you need to sanitize user-provided data before it reaches the LLM. The anonymize.solutions REST API provides a drop-in preprocessing step for your LLM call chain.

import requests

def safe_llm_query(user_text, api_key, llm_api):
    # Step 1: Anonymize user input
    anon_response = requests.post(
        "https://api.anonymize.solutions/v1/anonymize",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "text": user_text,
            "mode": "encrypt",      # reversible
            "languages": ["en"],
            "session_key": "user-session-id-123"
        }
    ).json()

    anonymized_text = anon_response["result"]

    # Step 2: Query LLM with anonymized text
    llm_response = llm_api.complete(anonymized_text)

    # Step 3: De-anonymize response for authorized user
    deanon_response = requests.post(
        "https://api.anonymize.solutions/v1/deanonymize",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "text": llm_response.text,
            "session_key": "user-session-id-123"
        }
    ).json()

    return deanon_response["result"]

The De-anonymization Workflow

The encrypt/decrypt workflow is what separates anonymize.solutions from simple redaction tools. Here is how a typical session works:

User input: “Draft a letter to Maria Gonzalez (maria.g@acme.com) about her account #847391.”
Encrypted by extension: “Draft a letter to [AES:maria-001] ([AES:email-001]) about her account [AES:acct-001].”
LLM processes the request using encrypted tokens as entity references
LLM response: “Dear [AES:maria-001], Regarding your account [AES:acct-001]…”
Extension decrypts for authorized user: “Dear Maria Gonzalez, Regarding your account #847391…”

The LLM never saw real PII. The response is fully actionable for the authorized user. The audit log shows only encrypted tokens. The original values can only be recovered by someone with the session key.

Zero-Knowledge Means Zero Risk

anonymize.solutions uses a Zero-Knowledge architecture: the session key used for AES-256-GCM encryption is derived from your password using Argon2id on your device and never transmitted to our servers. We encrypt the data in transit, but we cannot decrypt it. Even in the event of a server breach, encrypted tokens cannot be reversed without the user’s key.

This is architecturally different from approaches where the anonymization service holds the key mapping. In those systems, a breach of the anonymization service exposes all pseudonym mappings. In a Zero-Knowledge system, the mapping exists only on the user’s device.

Compliance Impact: GDPR, HIPAA, EU AI Act

This workflow addresses multiple regulatory requirements simultaneously:

GDPR Article 25 (data protection by design): Technical measures applied before data leaves your environment
GDPR Article 28 (processor agreements): Encrypted tokens transmitted to LLM providers are not personal data under GDPR Recital 26, so no Article 28 agreement may be required
HIPAA Minimum Necessary: Only the minimum information needed for the AI task is transmitted — and even that is encrypted
EU AI Act Article 10: Inference pipelines for high-risk AI systems can demonstrate technical PII minimization

Conclusion: Anonymize First, Then Query

The productivity benefits of AI are real and significant. The data protection risks are equally real. The solution is not to avoid AI — it is to anonymize before querying. With MCP Server integration for developers, Chrome Extension for direct users, and REST API for application builders, anonymize.solutions provides the complete toolchain for Zero-Knowledge AI usage across every workflow in your organisation.

Get started: The Chrome Extension takes 2 minutes to install. The MCP Server configuration above takes 5 minutes. The REST API integration is a 15-line code change to your existing LLM call. View all integration options →

How to Protect PII in LLM Prompts: A Developer’s Guide to Zero-Knowledge AI

Why Every Prompt is a Potential Data Breach

The 3 Ways PII Leaks Through AI Prompts

1. Training Data Contamination

2. Prompt Logging by AI Providers

3. Output Exposure in Conversation History

Solution 1 — MCP Server (For Developers)

Solution 2 — Chrome Extension (For Non-Technical Users)

Solution 3 — REST API (For Application Builders)

The De-anonymization Workflow

Zero-Knowledge Means Zero Risk

Compliance Impact: GDPR, HIPAA, EU AI Act

Conclusion: Anonymize First, Then Query

Related Articles

Use AI Without Exposing Your Data