What GDPR Says About Data Minimization
Article 5(1)(c) — Data minimization: "Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed ('data minimisation')."
This deceptively simple sentence has three operative tests that every DPO must apply to each data processing activity:
- Adequate: Does the data actually enable the stated purpose? Collecting data that does not contribute to the purpose fails this test.
- Relevant: Is there a rational connection between the data collected and the purpose? Name and address are relevant for delivery; date of birth is not.
- Limited to what is necessary: Even if data is relevant, you must collect the minimum sufficient amount. If a postcode serves the purpose, you do not need the full street address.
Supervisory authorities consistently interpret "necessary" strictly. When in doubt, collect less.
Article 5: The 7 Data Protection Principles
Data minimization sits alongside six other principles in Article 5(1). Understanding them as a system helps DPOs build coherent compliance programs:
| Principle | Article 5(1) | Key Requirement |
|---|---|---|
| Lawfulness, fairness, transparency | (a) | Legal basis; inform data subjects |
| Purpose limitation | (b) | Collect for specified, explicit purposes only |
| Data minimization | (c) | Adequate, relevant, limited to what is necessary |
| Accuracy | (d) | Keep data accurate; rectify without delay |
| Storage limitation | (e) | Retain only as long as necessary |
| Integrity and confidentiality | (f) | Appropriate security measures |
| Accountability | Art. 5(2) | Demonstrate compliance |
Article 25: Privacy by Design and by Default
Article 25 operationalizes data minimization at the system design level. It requires controllers to implement appropriate technical and organizational measures — both at the time of design and by default at operation.
Privacy by Design (Art. 25(1)): When designing a new system, processing activity, or product, data minimization must be built in. This means:
- Conducting a Privacy Impact Assessment (PIA) before building
- Designing forms, APIs, and databases to collect only required fields
- Implementing automatic deletion schedules from day one
- Building anonymization or pseudonymization into data flows at ingestion
Privacy by Default (Art. 25(2)): In the absence of a specific action by the data subject, the system must default to the most privacy-friendly settings. Pre-ticked consent boxes, maximally scoped data collection, and indefinite retention are all violations of this requirement.
Article 32: Security of Processing
Article 32 requires controllers and processors to implement "appropriate technical and organisational measures" to ensure security of processing, taking into account the state of the art, costs, and the risks posed by processing.
Article 32(1)(a) explicitly lists pseudonymisation and encryption as examples of appropriate technical measures. This is a direct mandate: organizations processing personal data must consider and implement these controls, or document why they are not appropriate.
The connection to data minimization is direct: data that has been anonymized or pseudonymized at ingestion presents significantly lower risk in the event of a breach, which in turn may eliminate the notification obligations under Article 33 (supervisory authority notification within 72 hours) and Article 34 (data subject notification).
The 5 Data Minimization Techniques
Each technique is appropriate for different contexts and risk profiles:
- Not collecting: The most effective technique. Redesign forms, APIs, and processes to simply not request unnecessary data. A newsletter signup does not need date of birth.
- Pseudonymization: Replace identifying values with tokens. The original values can be restored with the mapping key. Appropriate for operational data that may need re-identification for legitimate purposes (customer service, billing disputes).
- Anonymization: Remove or transform data so that re-identification is not reasonably possible. Appropriate for analytics, reporting, and AI training data. Once truly anonymized, GDPR no longer applies.
- Aggregation: Replace individual records with statistical summaries. "47 customers in Berlin" instead of a list of names and addresses. Effective for reporting and business intelligence.
- Truncation and generalization: Reduce precision. Store birth year instead of full date of birth. Store postal district instead of full postcode. Appropriate when approximate values serve the purpose.
Implementation Checklist for DPOs
Use this 12-item checklist for each processing activity in your ROPA (Record of Processing Activities):
- ☐ Define the specific purpose of processing before designing the data flow
- ☐ List every data field collected and document why each is necessary for the purpose
- ☐ Remove or anonymize any field that cannot be justified against the purpose test
- ☐ Set a retention period for every data category — no open-ended retention
- ☐ Implement automatic deletion or anonymization at the end of the retention period
- ☐ Apply pseudonymization at data ingestion for operational data that may need re-identification
- ☐ Apply anonymization for data used in analytics, ML training, and reporting
- ☐ Conduct a PIA (Privacy Impact Assessment) for high-risk processing activities
- ☐ Document all technical measures (encryption, pseudonymization, access controls) in your ROPA
- ☐ Review data minimization compliance when onboarding new processors or changing purposes
- ☐ Scan existing databases with piisafe.eu to identify undocumented PII
- ☐ Test data subject deletion requests to verify data is actually removed from all systems
How to Audit Existing Data with piisafe.eu
Before implementing data minimization, you need to know what PII you already hold. piisafe.eu provides a free scanning tool that detects PII across websites, exported database records, documents, and log files.
Step 1: Website scan. Enter your domain at piisafe.eu. The scanner checks for PII exposed in HTML, JavaScript, and API responses visible to public crawlers — a common source of inadvertent disclosure.
Step 2: Document scan. Use the anonymize.solutions detect endpoint to scan exported database records, CSV files, and documents. The analyze operator returns a compliance risk report showing which GDPR categories are present and at what volume.
# Scan a CSV export for GDPR-relevant PII
curl -X POST https://api.anonymize.solutions/v1/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "'"$(cat customer-export.csv)"'",
"preset": "gdpr",
"output": "compliance_report"
}'
# Returns:
# {
# "entity_counts": {"PERSON": 1247, "EMAIL": 1247, "PHONE": 834, "ADDRESS": 612},
# "gdpr_risk_level": "HIGH",
# "articles_applicable": ["Art. 5(1)(c)", "Art. 25", "Art. 32"],
# "recommended_actions": ["pseudonymize", "set_retention_policy"]
# }Step 3: Map findings to your ROPA. For each PII category found, verify it is documented in your Record of Processing Activities with a lawful basis, retention period, and technical measures.
Documenting Your Data Minimization for DPIAs
A Data Protection Impact Assessment (DPIA) is required under Article 35 for processing activities likely to result in high risk. The DPIA must document the necessity and proportionality of the processing — which is precisely the data minimization analysis.
Your DPIA documentation should include:
- Purpose specification: Exact description of what the data is used for
- Necessity assessment: Field-by-field justification for each data element collected
- Technical measures: Pseudonymization, encryption, access controls — with specific tool references (e.g., "anonymize.solutions API, GDPR preset, session-based pseudonymization")
- Retention schedules: Specific dates or event triggers for deletion/anonymization
- Residual risks: Documented residual risks after measures are applied, and acceptance rationale
The anonymize.solutions Managed Private and Self-Managed packages include a compliance documentation export that generates a machine-readable and human-readable technical measures report — suitable for direct inclusion in DPIA documentation.