Precogs Priority: Intelligence for PII & Secret Detection
Advanced PII & Secrets Security
Precogs AI Priority is powered by Adaptive Intelligence a precision-engineered system that outperforms traditional tools by intelligently combining pattern matching with context-aware machine learning. Stop choosing between speed and accuracy. Secure your production with both.
Precogs Priority eliminates this trade-off with Adaptive Intelligence, a multi-layer detection architecture that dynamically selects the optimal strategy for every content type.
| Metric | Precogs Priority | Industry Average |
|---|---|---|
| Precision | 99.2% | 75-85% |
| Recall | 98.3% | 80-90% |
| Speed | 0.002s/KB | 0.5-2s/KB |
| False Positive Rate | 1-3% | 10-25% |
State of the Art: Where We Fit
The Evolution of Sensitive Data Detection

Competitive Landscape
| Tool | Approach | Precision | Recall | Speed | PII | Secrets |
|---|---|---|---|---|---|---|
| Precogs Priority | Adaptive Intelligence | 99.2% | 98.3% | 0.002s | ✅ | ✅ |
| TruffleHog v3 | Patterns + Verification | 95% | 88% | 0.05s | ❌ | ✅ |
| Gitleaks | Patterns | 92% | 85% | 0.01s | ❌ | ✅ |
| Microsoft Presidio | ML (spaCy) | 85% | 92% | 0.5s | ✅ | ⚠️ |
| AWS Macie | ML + Patterns | 90% | 90% | N/A | ✅ | ⚠️ |
| GitGuardian | Patterns + ML | 94% | 90% | SaaS | ⚠️ | ✅ |
Research Foundation
Our approach builds on peer-reviewed research:
- Adaptive Detection: Studies show multi-layer detection achieves 17% higher F1-score than pure ML (arXiv:2510.07551)
- Context-Aware Filtering: Reduces false positives by 60-80% vs pattern-only (Nature 2025)
- Entropy Thresholds: Optimized Shannon entropy cutoffs for secret detection with minimal noise
Overview
Our platform uniquely integrates three core technologies—Instant Pattern Recognition, Context-Aware Machine Learning, and High-Entropy Analysis—into a unified pipeline that achieves industry-leading precision (99.2%) and recall (98.3%).
Unlike single-method tools that sacrifice accuracy for speed or vice versa, Precogs Priority dynamically selects the optimal detection strategy based on content type, file format, and organizational requirements.
The Challenge
Modern organizations face an exponentially growing attack surface for sensitive data exposure:
| Challenge | Impact |
|---|---|
| Credential Leaks | 80% of breaches involve compromised credentials |
| PII Exposure | Average GDPR fine: €2.4M; HIPAA: $1.5M |
| False Positives | Security teams spend 25% of time on false alerts |
| Diverse Formats | Code, configs, documents, logs, images—all need scanning |
| Speed vs Accuracy | Traditional tools force a trade-off |
Precogs Priority solves these challenges with an intelligent, adaptive detection architecture.
Precogs Adaptive Intelligence: How it Works

Layer 1: Instant Pattern Discovery
Our pattern layer provides the foundation for fast, accurate detection of structured data.
Core Pattern Library (50+ Types)
PII Patterns:
- Personal identifiers: Names, emails, phone numbers (20+ country formats)
- Government IDs: SSN, passport, driver's license, UK NINO, EU national IDs
- Financial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFT
- Healthcare: Patient IDs, medical record numbers, insurance identifiers
- Technical: IP addresses (v4/v6), MAC addresses, device IDs
Secret Patterns:
- Cloud credentials: AWS, GCP, Azure (access keys, service accounts)
- AI/ML platforms: OpenAI, Anthropic, HuggingFace, Replicate
- Version control: GitHub, GitLab, Bitbucket tokens
- Payment systems: Stripe, Square, PayPal, Braintree
- Communication: Slack, Discord, Telegram, Twilio
- Infrastructure: Database URLs, JWTs, private keys, SSH keys
Extended Pattern Database (761+ Patterns)
Our pattern database covers edge cases and emerging credential formats across:
- 100+ cloud services
- 50+ SaaS platforms
- Regional variations and legacy formats
- Custom enterprise patterns
Key Features
| Feature | Description |
|---|---|
| Format Validation | Luhn checksum for credit cards, phone number parsing |
| International Support | Phone numbers in 20+ country formats |
| JSON/YAML Aware | Correctly parses secrets in config file formats |
| Placeholder Filtering | Ignores "YOUR_API_KEY", "changeme", demo values |
Layer 2: Intelligent Context Validation
For unstructured text where patterns alone are insufficient, our ML layer provides context-aware detection.
Transformer-Based NER
We employ state-of-the-art transformer models trained on large-scale datasets for named entity recognition (NER). Our ML approach:
- Understands context: "Contact John at..." → John is a person name
- Handles variations: Nicknames, misspellings, unconventional formats
- Multi-language support: Recognizes entities across languages
- Multi-model selection: Automatically routes code to StarPII and documentation to Piiranha for optimal results
- Adaptive thresholds: Per-entity-type confidence tuning
ML Detection Modes
| Mode | Use Case | Speed | Accuracy |
|---|---|---|---|
| Disabled | Real-time scanning, structured data | 0.002s/KB | 99.2% precision |
| Enabled | Batch processing, documents, emails | 0.1s/KB | +16.7% recall |
What ML Adds
┌────────────────────────────────────────────────────────────────┐ │ ML DETECTION ADVANTAGES │ ├────────────────────────────────────────────────────────────────┤ │ ✅ Names in prose: "Please forward to John Smith..." │ │ ✅ Addresses in text: "Located at 123 Main Street, Suite 5" │ │ ✅ Emails with typos: "john dot smith at company dot com" │ │ ✅ Context-aware secrets: Variable names indicating keys │ │ ✅ Non-standard formats: Obfuscated or encoded data │ └────────────────────────────────────────────────────────────────┘
Layer 3: High-Entropy Secret Detection
High-entropy strings often indicate randomly-generated secrets that don't match known patterns.
Shannon Entropy Detection
Our entropy analyzer calculates the randomness of strings to identify:
- API keys with non-standard formats
- Randomly generated passwords
- Encrypted tokens
- Base64-encoded secrets
Context-Aware Filtering
Not all high-entropy strings are secrets. Our analyzer filters:
| Filtered | Reason |
|---|---|
| Base64 image data | data:image/png;base64,... |
| SVG path coordinates | M 10 20 L 30 40 |
| CSS color codes | #ff5500, rgba(255,0,0,0.5) |
| Version strings | 1.2.3.4, v2.0.0-beta |
| UUIDs in expected contexts | Logging, tracing |
Layer 4: PrecisionShift™ Fusion & Validation
The final layer ensures high precision by validating and deduplicating findings.
False Positive Filtering (70+ Rules)
We maintain extensive filters for common false positives:
Name Filters:
- Job titles: "Admin", "Manager", "Director"
- Department labels: "Patient Services", "Customer Support"
- Documentation terms: "Example User", "Test Account"
- Geographic names: City names, street types
Date/Time Filters:
- Log timestamps:
2024-01-15 10:30:00 - ISO dates in code:
datetime.now() - Version numbers:
1.2.3.4
Technical Filters:
- IP-like version strings
- SVG coordinates and transforms
- CSS values and properties
Intelligent Deduplication
When multiple detection layers find the same data:
Priority Order: 1. ML Detection (highest - context-aware) 2. Pattern Detection (high - precise format matching) 3. Entropy Detection (medium - catches unknowns) Resolution: - Same span, same type → Keep highest confidence - Overlapping spans → Prefer more specific type - Complementary detections → Merge and enhance
Detection Capabilities
PII Detection (28+ Types)
| Category | Types |
|---|---|
| Personal | Name, Email, Phone, Address, Date of Birth |
| Government | SSN, Passport, Driver's License, National IDs |
| Financial | Credit Card, Bank Account, IBAN, Bitcoin |
| Healthcare | Patient ID, MRN, Insurance ID |
| Automotive | VIN, IMEI, ICCID, IMSI, EID, License Plates |
| Technical | IP Address, MAC Address, Device ID, Bluetooth ID |
Secret Detection (50+ Types)
| Category | Types |
|---|---|
| Cloud | AWS, GCP, Azure credentials |
| AI/ML | OpenAI, Anthropic, HuggingFace |
| DevOps | GitHub, GitLab, Docker, Kubernetes |
| Payment | Stripe, Square, PayPal |
| Communication | Slack, Discord, Twilio, SendGrid |
| Database | Connection strings, passwords |
| Crypto | Private keys, JWTs, SSH keys |
Compliance Framework Integration
Precogs Priority automatically maps findings to regulatory requirements:
Supported Frameworks
| Framework | Coverage |
|---|---|
| GDPR | EU personal data protection |
| HIPAA | US healthcare (18 PHI identifiers) |
| PCI-DSS | Payment card industry |
| SOX | Financial system controls |
| CCPA | California consumer privacy |
| FERPA | Education records |
| GLBA | Financial privacy |
Automated Mapping
{ "finding": { "type": "CREDIT_CARD", "value": "4111-****-****-1111", "file": "payment_log.csv" }, "compliance": { "pci_dss": { "applicable": true, "requirement": "3.4 - Render PAN unreadable", "action": "Tokenize or encrypt card data" }, "gdpr": { "applicable": true, "category": "Financial data", "action": "Ensure lawful basis for processing" } } }
Enterprise Features
Credential Enrichment
For detected cloud credentials, our enterprise module provides:
| Feature | Description |
|---|---|
| Identity Lookup | Who owns this credential? |
| Permission Analysis | What can it access? |
| Status Verification | Is it active or revoked? |
| Risk Scoring | 0-100 score with CRITICAL/HIGH/MEDIUM/LOW levels |
| Remediation Guidance | Specific steps to resolve |
Risk Scoring Factors
┌────────────────────────────────────────────────────────────────┐ │ RISK CALCULATION │ ├────────────────────────────────────────────────────────────────┤ │ Base Score: Type-specific (AWS = 90, API key = 60) │ │ + Active Status: +20 if verified active │ │ + Admin Access: +30 if elevated permissions │ │ + Production Env: +20 if production indicators │ │ + File Location: +15 if in .env, .git, or config │ │ - Development Env: -10 if dev/test indicators │ │ - MFA Enabled: -10 if multi-factor auth present │ │ ───────────────────────────────────────────────────────────── │ │ Final Score: 0-100 → Risk Level (CRITICAL/HIGH/MEDIUM/LOW) │ └────────────────────────────────────────────────────────────────┘
Performance
Benchmarks
| Metric | Value |
|---|---|
| Precision | 99.2% (pattern mode), 95%+ (ML mode) |
| Recall | 98.3% (pattern mode), 99%+ (ML mode) |
| Speed (Pattern) | 0.002s per KB |
| Speed (ML) | 0.1s per KB |
| Large Repo (10K files) | 25s (pattern), 20min (ML) |
Accuracy by Data Type
| Data Type | Precision | Recall |
|---|---|---|
| Structured forms | 99.5% | 99.0% |
| Email content | 97.8% | 96.5% |
| Medical records | 98.5% | 98.0% |
| Source code | 99.0% | 98.5% |
| Config files | 99.5% | 99.5% |
Deployment Options
Web Application
Interactive scanning with real-time results, visualization, and export.
Command Line Interface
Batch processing for CI/CD integration and automation.
API Integration
RESTful endpoints for custom application integration.
Cloud Deployment
AWS, GCP, Azure with auto-scaling and high availability.
Why Precogs Priority?
| Differentiator | Benefit |
|---|---|
| Adaptive Intelligence | Multi-layer protection without the performance tax |
| Enterprise Context | Zero-noise results that matter to your business |
| Enterprise Ready | Risk scoring, compliance mapping, remediation |
| Fast by Default | Pattern mode for real-time, ML for batch |
| International | 20+ phone formats, multi-language names |
| Medical PII | HIPAA-specific identifiers |
| AI/ML Coverage | OpenAI, Anthropic, emerging AI platforms |
The Bottom Line
Precogs Priority is the standard for next-generation data protection:
✅ Adaptive Intelligence Engine for unmatched precision and recall
✅ Context-aware validation that understands file types and content structure
✅ Enterprise-grade risk scoring, compliance mapping, and remediation guidance
✅ Production-ready with 99.2% precision and 98.3% recall
✅ Flexible deployment via web UI, CLI, API, or cloud infrastructure
Whether you're securing source code, processing documents, or maintaining compliance, Precogs Priority provides the accuracy, speed, and intelligence your security program demands.
Explore the Precogs AI Data Security Series
-
PII Detection Guide: Adaptive Intelligence vs. Static Patterns
-
Secret Scanning Guide: Precogs Adaptive Intelligence vs. TruffleHog
-
Automotive PII Detection: Securing VIN, IMEI, and Telematics Data
Getting Started with Precogs Priority
Find sensitive data before attackers do. Secure your repositories today.
Explore the Precogs Platform: Learn how Precogs AI-native security helps detect sensitive data, vulnerabilities, and risks across your code and repositories.
Access the Precogs App: Sign in to the Precogs platform and start scanning your repositories.
Connect Your Repositories: Integrate your GitHub, GitLab, or Bitbucket repositories and let Precogs automatically analyze your code, history, and artifacts for security and data risks.
Flexible Deployment Options: Precogs supports cloud, private cloud, and on-premise deployments for organizations with strict security or data residency requirements. Contact our team to learn more.
