Table of Content

Precogs Priority: Intelligence for PII & Secret Detection

Advanced PII & Secrets Security

Yasi ZhouUpdated on 7th Mar, 2026

Precogs Priority: Intelligence for PII & Secret Detection

Precogs AI Priority is powered by Adaptive Intelligence a precision-engineered system that outperforms traditional tools by intelligently combining pattern matching with context-aware machine learning. Stop choosing between speed and accuracy. Secure your production with both.

Precogs Priority eliminates this trade-off with Adaptive Intelligence, a multi-layer detection architecture that dynamically selects the optimal strategy for every content type.

Metric	Precogs Priority	Industry Average
Precision	99.2%	75-85%
Recall	98.3%	80-90%
Speed	0.002s/KB	0.5-2s/KB
False Positive Rate	1-3%	10-25%

State of the Art: Where We Fit

The Evolution of Sensitive Data Detection

Evolution of Sensitive Data Detection

Competitive Landscape

Tool	Approach	Precision	Recall	Speed	PII	Secrets
Precogs Priority	Adaptive Intelligence	99.2%	98.3%	0.002s	✅	✅
TruffleHog v3	Patterns + Verification	95%	88%	0.05s	❌	✅
Gitleaks	Patterns	92%	85%	0.01s	❌	✅
Microsoft Presidio	ML (spaCy)	85%	92%	0.5s	✅	⚠️
AWS Macie	ML + Patterns	90%	90%	N/A	✅	⚠️
GitGuardian	Patterns + ML	94%	90%	SaaS	⚠️	✅

Research Foundation

Our approach builds on peer-reviewed research:

Adaptive Detection: Studies show multi-layer detection achieves 17% higher F1-score than pure ML (arXiv:2510.07551)
Context-Aware Filtering: Reduces false positives by 60-80% vs pattern-only (Nature 2025)
Entropy Thresholds: Optimized Shannon entropy cutoffs for secret detection with minimal noise

Overview

Our platform uniquely integrates three core technologies—Instant Pattern Recognition, Context-Aware Machine Learning, and High-Entropy Analysis—into a unified pipeline that achieves industry-leading precision (99.2%) and recall (98.3%).

Unlike single-method tools that sacrifice accuracy for speed or vice versa, Precogs Priority dynamically selects the optimal detection strategy based on content type, file format, and organizational requirements.

The Challenge

Modern organizations face an exponentially growing attack surface for sensitive data exposure:

Challenge	Impact
Credential Leaks	80% of breaches involve compromised credentials
PII Exposure	Average GDPR fine: €2.4M; HIPAA: $1.5M
False Positives	Security teams spend 25% of time on false alerts
Diverse Formats	Code, configs, documents, logs, images—all need scanning
Speed vs Accuracy	Traditional tools force a trade-off

Precogs Priority solves these challenges with an intelligent, adaptive detection architecture.

Precogs Adaptive Intelligence: How it Works

Precogs Adaptive Intelligence

Layer 1: Instant Pattern Discovery

Our pattern layer provides the foundation for fast, accurate detection of structured data.

Core Pattern Library (50+ Types)

PII Patterns:

Personal identifiers: Names, emails, phone numbers (20+ country formats)
Government IDs: SSN, passport, driver's license, UK NINO, EU national IDs
Financial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFT
Healthcare: Patient IDs, medical record numbers, insurance identifiers
Technical: IP addresses (v4/v6), MAC addresses, device IDs

Secret Patterns:

Cloud credentials: AWS, GCP, Azure (access keys, service accounts)
AI/ML platforms: OpenAI, Anthropic, HuggingFace, Replicate
Version control: GitHub, GitLab, Bitbucket tokens
Payment systems: Stripe, Square, PayPal, Braintree
Communication: Slack, Discord, Telegram, Twilio
Infrastructure: Database URLs, JWTs, private keys, SSH keys

Extended Pattern Database (761+ Patterns)

Our pattern database covers edge cases and emerging credential formats across:

100+ cloud services
50+ SaaS platforms
Regional variations and legacy formats
Custom enterprise patterns

Key Features

Feature	Description
Format Validation	Luhn checksum for credit cards, phone number parsing
International Support	Phone numbers in 20+ country formats
JSON/YAML Aware	Correctly parses secrets in config file formats
Placeholder Filtering	Ignores "YOUR_API_KEY", "changeme", demo values

Layer 2: Intelligent Context Validation

For unstructured text where patterns alone are insufficient, our ML layer provides context-aware detection.

Transformer-Based NER

We employ state-of-the-art transformer models trained on large-scale datasets for named entity recognition (NER). Our ML approach:

Understands context: "Contact John at..." → John is a person name
Handles variations: Nicknames, misspellings, unconventional formats
Multi-language support: Recognizes entities across languages
Multi-model selection: Automatically routes code to StarPII and documentation to Piiranha for optimal results
Adaptive thresholds: Per-entity-type confidence tuning

ML Detection Modes

Mode	Use Case	Speed	Accuracy
Disabled	Real-time scanning, structured data	0.002s/KB	99.2% precision
Enabled	Batch processing, documents, emails	0.1s/KB	+16.7% recall

What ML Adds

┌────────────────────────────────────────────────────────────────┐
│                    ML DETECTION ADVANTAGES                     │
├────────────────────────────────────────────────────────────────┤
│  ✅ Names in prose: "Please forward to John Smith..."          │ 
│  ✅ Addresses in text: "Located at 123 Main Street, Suite 5"   │
│  ✅ Emails with typos: "john dot smith at company dot com"     │
│  ✅ Context-aware secrets: Variable names indicating keys      │
│  ✅ Non-standard formats: Obfuscated or encoded data           │
└────────────────────────────────────────────────────────────────┘

Layer 3: High-Entropy Secret Detection

High-entropy strings often indicate randomly-generated secrets that don't match known patterns.

Shannon Entropy Detection

Our entropy analyzer calculates the randomness of strings to identify:

API keys with non-standard formats
Randomly generated passwords
Encrypted tokens
Base64-encoded secrets

Context-Aware Filtering

Not all high-entropy strings are secrets. Our analyzer filters:

Filtered	Reason
Base64 image data	`data:image/png;base64,...`
SVG path coordinates	`M 10 20 L 30 40`
CSS color codes	`#ff5500`, `rgba(255,0,0,0.5)`
Version strings	`1.2.3.4`, `v2.0.0-beta`
UUIDs in expected contexts	Logging, tracing

Layer 4: PrecisionShift™ Fusion & Validation

The final layer ensures high precision by validating and deduplicating findings.

False Positive Filtering (70+ Rules)

We maintain extensive filters for common false positives:

Name Filters:

Job titles: "Admin", "Manager", "Director"
Department labels: "Patient Services", "Customer Support"
Documentation terms: "Example User", "Test Account"
Geographic names: City names, street types

Date/Time Filters:

Log timestamps: 2024-01-15 10:30:00
ISO dates in code: datetime.now()
Version numbers: 1.2.3.4

Technical Filters:

IP-like version strings
SVG coordinates and transforms
CSS values and properties

Intelligent Deduplication

When multiple detection layers find the same data:

Priority Order:
1. ML Detection (highest - context-aware)
2. Pattern Detection (high - precise format matching)
3. Entropy Detection (medium - catches unknowns)

Resolution:
- Same span, same type → Keep highest confidence
- Overlapping spans → Prefer more specific type
- Complementary detections → Merge and enhance

Detection Capabilities

PII Detection (28+ Types)

Category	Types
Personal	Name, Email, Phone, Address, Date of Birth
Government	SSN, Passport, Driver's License, National IDs
Financial	Credit Card, Bank Account, IBAN, Bitcoin
Healthcare	Patient ID, MRN, Insurance ID
Automotive	VIN, IMEI, ICCID, IMSI, EID, License Plates
Technical	IP Address, MAC Address, Device ID, Bluetooth ID

Secret Detection (50+ Types)

Category	Types
Cloud	AWS, GCP, Azure credentials
AI/ML	OpenAI, Anthropic, HuggingFace
DevOps	GitHub, GitLab, Docker, Kubernetes
Payment	Stripe, Square, PayPal
Communication	Slack, Discord, Twilio, SendGrid
Database	Connection strings, passwords
Crypto	Private keys, JWTs, SSH keys

Compliance Framework Integration

Precogs Priority automatically maps findings to regulatory requirements:

Supported Frameworks

Framework	Coverage
GDPR	EU personal data protection
HIPAA	US healthcare (18 PHI identifiers)
PCI-DSS	Payment card industry
SOX	Financial system controls
CCPA	California consumer privacy
FERPA	Education records
GLBA	Financial privacy

Automated Mapping

{
  "finding": {
    "type": "CREDIT_CARD",
    "value": "4111-****-****-1111",
    "file": "payment_log.csv"
  },
  "compliance": {
    "pci_dss": {
      "applicable": true,
      "requirement": "3.4 - Render PAN unreadable",
      "action": "Tokenize or encrypt card data"
    },
    "gdpr": {
      "applicable": true,
      "category": "Financial data",
      "action": "Ensure lawful basis for processing"
    }
  }
}

Enterprise Features

Credential Enrichment

For detected cloud credentials, our enterprise module provides:

Feature	Description
Identity Lookup	Who owns this credential?
Permission Analysis	What can it access?
Status Verification	Is it active or revoked?
Risk Scoring	0-100 score with CRITICAL/HIGH/MEDIUM/LOW levels
Remediation Guidance	Specific steps to resolve

Risk Scoring Factors

┌────────────────────────────────────────────────────────────────┐
│                      RISK CALCULATION                          │
├────────────────────────────────────────────────────────────────┤
│  Base Score: Type-specific (AWS = 90, API key = 60)            │
│  + Active Status: +20 if verified active                       │
│  + Admin Access: +30 if elevated permissions                   │
│  + Production Env: +20 if production indicators                │
│  + File Location: +15 if in .env, .git, or config              │
│  - Development Env: -10 if dev/test indicators                 │ 
│  - MFA Enabled: -10 if multi-factor auth present               │
│  ───────────────────────────────────────────────────────────── │
│  Final Score: 0-100 → Risk Level (CRITICAL/HIGH/MEDIUM/LOW)    │
└────────────────────────────────────────────────────────────────┘

Performance

Benchmarks

Metric	Value
Precision	99.2% (pattern mode), 95%+ (ML mode)
Recall	98.3% (pattern mode), 99%+ (ML mode)
Speed (Pattern)	0.002s per KB
Speed (ML)	0.1s per KB
Large Repo (10K files)	25s (pattern), 20min (ML)

Accuracy by Data Type

Data Type	Precision	Recall
Structured forms	99.5%	99.0%
Email content	97.8%	96.5%
Medical records	98.5%	98.0%
Source code	99.0%	98.5%
Config files	99.5%	99.5%

Deployment Options

Web Application

Interactive scanning with real-time results, visualization, and export.

Command Line Interface

Batch processing for CI/CD integration and automation.

API Integration

RESTful endpoints for custom application integration.

Cloud Deployment

AWS, GCP, Azure with auto-scaling and high availability.

Why Precogs Priority?

Differentiator	Benefit
Adaptive Intelligence	Multi-layer protection without the performance tax
Enterprise Context	Zero-noise results that matter to your business
Enterprise Ready	Risk scoring, compliance mapping, remediation
Fast by Default	Pattern mode for real-time, ML for batch
International	20+ phone formats, multi-language names
Medical PII	HIPAA-specific identifiers
AI/ML Coverage	OpenAI, Anthropic, emerging AI platforms

The Bottom Line

Precogs Priority is the standard for next-generation data protection:

✅ Adaptive Intelligence Engine for unmatched precision and recall

✅ Context-aware validation that understands file types and content structure

✅ Enterprise-grade risk scoring, compliance mapping, and remediation guidance

✅ Production-ready with 99.2% precision and 98.3% recall

✅ Flexible deployment via web UI, CLI, API, or cloud infrastructure

Whether you're securing source code, processing documents, or maintaining compliance, Precogs Priority provides the accuracy, speed, and intelligence your security program demands.

Explore the Precogs AI Data Security Series

Getting Started with Precogs Priority

Find sensitive data before attackers do. Secure your repositories today.

Explore the Precogs Platform: Learn how Precogs AI-native security helps detect sensitive data, vulnerabilities, and risks across your code and repositories.

Access the Precogs App: Sign in to the Precogs platform and start scanning your repositories.

Connect Your Repositories: Integrate your GitHub, GitLab, or Bitbucket repositories and let Precogs automatically analyze your code, history, and artifacts for security and data risks.

Flexible Deployment Options: Precogs supports cloud, private cloud, and on-premise deployments for organizations with strict security or data residency requirements. Contact our team to learn more.

Start Securing Your Data

Yasi Zhou

Stay Audit-Ready, Always

Explore the AI + Logic engine behind Precogs AI

Get started for free