48 Languages for PII Detection
Comprehensive multilingual support with spaCy (25 languages), Stanza (7 languages), XLM-RoBERTa (16 languages), automatic language detection, and full RTL (right-to-left) script support.
25 Languages with spaCy NLP
Highest accuracy Named Entity Recognition using language-specific statistical models. spaCy is the primary engine for major European and Asian languages.
About spaCy NLP
spaCy is an open-source library for industrial-strength Natural Language Processing developed by Explosion AI. These language-specific statistical models recognize person names, organizations, locations, and other named entities with the highest accuracy. Primary choice for major languages.
7 Languages with Stanza Neural NLP
Specialized neural network models from Stanford NLP for RTL scripts and Southeast Asian languages. Stanza excels at complex grammatical structures.
About Stanza NLP
Stanza is a Python NLP library developed by Stanford NLP Group. It uses bidirectional LSTM neural networks with character-level embeddings, providing strong accuracy for languages with complex morphology, RTL scripts, and non-Latin writing systems.
16 Languages with XLM-RoBERTa Transformer
Cross-lingual transformer model for low-resource languages. XLM-RoBERTa enables NER detection through transfer learning across 100+ languages.
About XLM-RoBERTa
XLM-RoBERTa is a cross-lingual transformer model developed by Facebook AI, pre-trained on 2.5TB of text in 100 languages. It enables Named Entity Recognition for low-resource languages through transfer learning, providing broad coverage where dedicated models don't exist.
Multilingual Detection Features
Automatic Language Detection
No configuration needed. The system automatically detects the language of your text and selects the appropriate NLP model.
- Per-Document Detection — Each document analyzed independently
- Mixed Language Support — Documents with multiple languages handled correctly
- Confidence Threshold — Language detection with confidence scoring
- Fallback Strategy — Uses regex patterns when NLP model unavailable
Right-to-Left (RTL) Support
Full support for RTL scripts with proper text direction handling during detection and anonymization.
What Gets Detected in Each Language
All 48 languages support the same core PII entity types through a combination of NLP models and regex patterns.
Person Names
First names, last names, full names, nicknames, and titles detected via NLP Named Entity Recognition.
Locations
Addresses, cities, countries, postal codes, and geographic locations identified by NLP models.
Contact Info
Email addresses, phone numbers, and URLs detected via regex patterns across all languages.
Financial Data
Credit card numbers, IBANs, bank accounts, and financial identifiers via format-specific patterns.
Dates
Dates of birth, dates, and temporal expressions in various formats specific to each locale.
Government IDs
SSNs, passport numbers, driver licenses, and national IDs via country-specific regex patterns.
AIR-GAPPED DESKTOP APP
The Air-Gapped Desktop App includes local NLP models for 15+ languages with offline processing. No internet connection required.
Learn About Air-Gapped EditionNeed a language not listed?
Contact us for custom language model integration or to discuss your specific multilingual requirements.