UnveilTech

UnveilScan Blog

← All articles

Try UnveilScan free

38 secret patterns we hunt (and why we stopped there)

Posted 2026-05-04 · 8 min read · secretsreconattack surface

Run a credential-detection tool over a public GitHub repo today and you'll get one of two experiences. Trufflehog ships somewhere north of 700 detectors. Gitleaks has roughly 150 rules. Both will return a wall of findings — most of them noise — and you'll spend more time triaging false positives than rotating real secrets.

UnveilScan curates 38 patterns. They live in internal/secretpatterns/patterns.go, hand-written, regularly tested against real corpus samples. This article walks through the curation philosophy, why fewer is more honest, the severity bands, and the redaction discipline that keeps us — and your secrets — out of the legal grey zone.

The 38 patterns, by family

Roughly grouped:

That's 38. No Twitter API keys (the platform pivoted, the keys are mostly dead). No LinkedIn, Discord, Reddit. No DigitalOcean (the format isn't fingerprinted enough to match without false positives). No JIRA, Confluence, Bitbucket — heuristic-only, too noisy.

The curation philosophy: precision over recall

Two ways to write a secret detector. The recall-maximalist approach (Trufflehog) tries to detect every secret, accepts that some matches will be false, and pushes the triage burden onto the user. The precision-maximalist approach (us) only flags when the format is distinctive enough that a match is almost certainly a real secret.

Concretely: we'll match AKIA[0-9A-Z]{16} because Amazon designed access keys with a fixed prefix, fixed length, and a restricted alphabet. A match is an AWS key with high probability. We won't match a generic 40-character base64-looking string because half the world's CI/CD job IDs look like that.

The cost of recall maximalism in production:

The cost of precision maximalism:

Industry trend supports us: every modern API provider in the last 5 years has adopted prefixed, length-fixed key formats specifically because they make detection trivial. Stripe (sk_live_), GitHub (ghp_), OpenAI (sk-), Anthropic (sk-ant-), Slack (xox) all converged on this pattern. The legacy "32 random chars" providers are dying off — the precision-maximalist approach gets stronger every year.

Severity bands: what fires CRITICAL vs HIGH vs MEDIUM

A leaked PEM private key is not the same as a leaked Twilio account SID. Severity bands reflect the operational consequence of the credential being public.

Counts are approximate because some patterns can fire at multiple severities depending on context — we don't currently differentiate. v1.5+ will add context-aware severity (a CRITICAL secret in a 5-year-old commit on a fork of a public dataset is a different alert than the same secret in last week's main-branch commit).

Redaction is not optional

We never store, log, or transmit the raw secret value. Ever. The discipline is implemented in secretpatterns.Redact() and applied at the earliest possible point in the pipeline — before the match leaves the package, before it hits the database, before it touches a log line.

The redaction format:

Why this matters operationally:

  1. The user's downstream tools never see the raw secret. Our API responses, alerting emails, webhook payloads, audit logs — all consume the redacted form. A user accidentally pasting an UnveilScan finding into a chat or ticket tracker doesn't propagate the leak.
  2. Database breach scenario. If our discovered_assets table leaks tomorrow, the attacker gets metadata but no usable credentials. The blast radius of a compromise of UnveilScan is bounded by what we don't store.
  3. Legal positioning. Storing leaked third-party credentials is at best a grey area, at worst CFAA / unauthorized-access territory in some jurisdictions. Storing only the redacted fingerprint + the file URL where it appeared keeps us strictly in the "we noticed and notified" lane, never the "we possess your credentials" lane.

This is non-negotiable. If you hear "we'll show you the raw secret in your dashboard", you're talking to a vendor that's not thinking clearly about the legal surface.

What we deliberately don't do

The line between "credential leak detection" and "active probing of someone else's account" is well-defined and we stay on the right side of it. Specifically:

Why curated wins over crowdsourced

Trufflehog and Gitleaks both accept community PRs for new detectors. The pattern space grows organically. The honest reading: most community PRs are well-intentioned but under-tested. The new "Discord webhook" detector ships, fires on 12% of all scanned files (Discord URLs are very common in code as comments / examples), and the false-positive rate degrades the entire tool's utility.

Our curation discipline:

  1. Add a pattern only when the format is fingerprintable to > 99% precision.
  2. Test the new pattern against ~1000 known-clean files (the Linux kernel, freeCodeCamp, a sample of well-managed enterprise repos) — false positives must be 0.
  3. Test against ~50 deliberately-leaked samples in our test corpus — true positives must be 100%.
  4. If either bar fails, the pattern doesn't ship. We'd rather miss a class of secret than degrade the tool's overall trustworthiness.

This is slow. We add roughly one pattern per quarter. It's also why a UnveilScan finding is one you can act on immediately — if we say it's a Stripe live key, it's a Stripe live key.

The roadmap: context-aware severity

What we'd like to ship next, in order:

  1. Commit age as a severity modifier. A CRITICAL secret in a commit from 2019 in a repo with no recent activity is probably long since rotated; the same finding in this morning's main push is an active incident. Same slug, different alert urgency.
  2. File path heuristics. A secret in tests/fixtures/ or example.env is more often a deliberate placeholder than a real leak. config/production.yml is the opposite.
  3. Cross-repo deduplication. The same AKIA appearing in 12 forks of a tutorial doesn't deserve 12 alerts.
  4. A few more high-precision patterns. Cloudflare workers AI key, Vercel deploy hooks, Notion integration tokens — all on the watchlist when their adoption hits a threshold and the format stabilises.

What we won't ship: a "submit your own pattern" UI. The curation discipline is the product.

Where to look at the actual list

The 38 patterns, with their regexes and severity, live in our internal/secretpatterns/patterns.go. Closed source, but the table above is the entire inventory. If you want to verify a specific pattern's behaviour, run a Recon scan on a domain you control where you've intentionally committed a test credential — the finding will tell you exactly which pattern matched, with the redacted sample for sanity-checking.

Find the secrets you forgot you committed

One Recon scan ingests every public GitHub repo that mentions your domain, runs the 38 patterns against the file content, and emails you within a minute if anything matches. We never store the raw value.

Run a Recon scan