EASM without a $50k tool: what crt.sh + DNS + GitHub actually gives you
Walk into a CISO meeting in 2026 and someone will pitch you External Attack Surface Management. UpGuard, Bitsight, Mandiant ASM, CyCognito, Censys ASM, Palo Alto Cortex Xpanse, Detectify EASM, Microsoft Defender EASM. Annual contracts in the $50,000 to $200,000 range. All "contact sales", no public pricing.
Strip the marketing and most EASM products sit on top of four data sources, all of them free or near-free for an organisation that wants to query them directly. This article walks through each source, what it actually covers, and where the paid vendors do still earn their keep.
Source 1: Certificate Transparency logs (crt.sh)
Every TLS certificate issued by a public Certificate Authority has been logged to a
Certificate Transparency log since 2018. Apple, Google, and Mozilla all require it for
browser trust. The logs are append-only, public, and queryable through aggregators
like crt.sh.
What you get:
- Every subdomain that has ever had a TLS cert issued, going back roughly 8 years.
- Including subdomains your team forgot existed:
jenkins-internal-2021.example.com,old-staging.example.com,test-customer-acme.example.com. - Issuer, validity dates, key info, serial. Useful for spotting unauthorised issuance.
What you don't get:
- Subdomains that never received a public cert (internal-only, self-signed, no TLS).
- Wildcard certs hide their children —
*.example.comis one CT entry that covers an arbitrary number of hosts.
Cost: free. Rate-limited (crt.sh is a community service, behave). One curl
against https://crt.sh/?q=example.com&output=json typically returns
everything in under 5 seconds. We covered this in detail in
Reading a CT log.
Source 2: DNS resolution + ASN/geo enrichment
Each subdomain from CT becomes a hostname. Resolve it. Most resolve to one or two
IPs. Each IP is a real production host (or a CDN edge). MaxMind's GeoLite2 databases
give you ASN, organisation, country, and city for every IP — free, downloaded as a
175MB .mmdb file you can ship with your tool.
What you learn:
- Which clouds the company actually uses (AS16509 = AWS, AS15169 = Google, AS8075 = Microsoft, AS13335 = Cloudflare).
- Whether the apex sits behind a CDN or talks to the origin directly. The
origin_behind_cdnfinding catches CDN-fronted apex with origin IP exposed via MX or SPF — the WAF is theatre if the origin is reachable. - Geographic distribution:
app.example.comresolves to 3 different continents = real anycast.app.example.comresolves to one rack in Roubaix = single point of failure. - Bulletproof / sanctioned hosting — IP in a known abuse-friendly ASN
(
AS44066Chang Way,AS9009M247) or a sanctioned country (KP, IR, SY, CU, RU, BY) is a different conversation.
Cost: free. net.LookupHost from any standard library. MaxMind GeoLite2
updates weekly, you just cron the download.
Source 3: Public cloud bucket discovery
S3 buckets, Google Cloud Storage buckets, and Azure Blob containers all have predictable URL formats:
https://<name>.s3.amazonaws.com/
https://storage.googleapis.com/<name>/
https://<account>.blob.core.windows.net/<container>/
Take a domain (acmecorp.com) and probe a dozen common naming conventions:
acmecorp, acmecorp-prod, acmecorp-backup,
acmecorp-uploads, acmecorp-data, acmecorp-public,
acme-prod, acme-backup, etc. HTTPS HEAD against each on
each provider. About 36 requests total. Anything that returns 200, 301, 302, 401, or
403 exists; 404 doesn't.
What you get:
- Inventory of public-cloud buckets associated with the brand.
- The 401/403 ones are private (good). The 200s are public. The interesting ones are 200s with a directory listing enabled.
What this isn't: bucket discovery is not bucket exploitation. We list existence; we do
not enumerate keys. The deep-dive (object enumeration, ACL audit, CORS misconfig)
is a separate concern handled by the buckets checker on Extended scans,
not by Recon.
Cost: free. ~36 HEAD requests per scan, well under any rate limit.
Source 4: GitHub code search
This is the source paid EASM tools rarely advertise but most quietly use. GitHub's code search API lets you query the entire universe of public repositories for any text. Search for your domain and you find every file that mentions it.
GET https://api.github.com/search/code?q=%22example.com%22+in:file
Authorization: Bearer ghp_xxx
Accept: application/vnd.github+json
Returns up to 1000 hits, paginated. Authenticated with a Personal Access Token (free,
scope public_repo is enough), the rate limit is 30 requests per minute
for code search — comfortable for one scan per domain.
What you find:
- The repos where your domain appears in code, config, docs, or commit messages.
- Sample paths:
config/production.yml,terraform/main.tf,.env.example,scripts/deploy.sh. Each one is a hint about your stack. - Most hits are noise: blog posts, dataset compilations, expired-domain registries, someone's college project that mentions your URL once. The signal-to-noise is about 1:5 in our experience.
And then — the reason this source matters more than the other three combined — you
can fetch the file content via the official content API
(/repos/{owner}/{repo}/contents/{path}?ref={branch} with Accept:
application/vnd.github.raw) and run it through a secret-pattern detector. Every
file that mentions your domain is a candidate for an AWS key, GitHub PAT, Stripe key,
Postgres URL with password, or PEM private key that someone committed by accident.
See our 38 regex patterns for what we actually look for.
Cost: free. One PAT, one search per Recon scan. Content fetches are the rate-limit cost — we cap at 60 file fetches per scan to keep one Recon comfortably inside the 5000 reqs/h authenticated GitHub budget.
What the four sources together actually cover
Run all four against a typical company domain and you get:
- 20–200 subdomains (CT).
- 5–50 unique IPs across N hosting providers, with ASN/geo per IP (DNS + MaxMind).
- 0–5 public buckets matching naming conventions (HEAD probes).
- 1–30 GitHub repositories that reference the domain (code search).
- 0–N secret leaks inside those repos (regex on fetched content).
This is the inventory most EASM dashboards show on their hero card. The vendors repackage it with a UI, a risk score, and ongoing alerting. Whether $50k-200k/year is justified depends on what you're paying for beyond the data.
Where paid EASM still earns its keep
Honest list of what paid vendors actually add over the free sources:
- Continuous monitoring + alerting. The free path requires you to
run the scans, store the diffs, alert on changes. A vendor does it for you and
keeps a year of history. UnveilScan has this in the alerting engine — channels
(email/webhook/Slack) and triggers including
new_secret_leak. - Internet-wide scan data (Censys / Shodan). Their crawlers run against the IPv4 internet on a regular schedule. You can ask "find me everything running nginx 1.18.0 with a certain header" globally. Powerful, but: most organisations don't actually need that view — they need their own attack surface, not the world's.
- Threat intelligence enrichment. Cross-reference your assets with known bad-actor infrastructure, leaked credentials, dark-web mentions. Real signal lives here for some buyers; a lot of "feeds" are noise.
- Compliance reporting. If you need a polished PDF that maps your assets to NIS 2 art. 21, ISO 27001 A.5.20 (supplier security), or PCI-DSS 12.x, someone has to render it. Most EASMs include this.
- Vendor risk dashboards. Your supply chain is also their attack surface. A vendor monitors hundreds of your suppliers continuously. This is the one place where the price tag tracks the value provided.
In other words: the data is free, the operational discipline isn't. You either build it (a Recon scan + a cron + a webhook is ~200 lines of code) or you buy it. The question worth asking your EASM vendor is which of the five points above they specifically deliver — most of them only do (1) plus a thin coat of (3) and (5).
Building it yourself: the pragmatic 80% path
If you want the data without the contract:
- Cron a daily script that hits
crt.shfor your apex, diffs against yesterday's snapshot, alerts on new entries via Slack webhook. ~30 lines of bash. - Cron a weekly resolve sweep over the CT-discovered subdomains, store IP + ASN + country in SQLite or a flat file. Diff for new IPs, new providers, sudden drops.
- Cron a monthly bucket-name sweep with whatever naming patterns match your brand.
- One-time GitHub code search for your domain, with a follow-up content fetch + secret regex scan. Re-run weekly. The 38 patterns we use are public knowledge — curate your own, or borrow ours.
Total infrastructure: one VM, one PAT, four cron entries, a Slack channel for the alerts. Time to first useful output: under a day. Annual cost: ~$60 in VPS rent.
This is exactly the pipeline UnveilScan ships under the Recon profile — except we run
it for you, dedup discoveries across scans (an asset stays "discovered" with its
original first_seen date even when re-confirmed later), and fire alerts on
new secret leaks immediately rather than the next morning's cron run.
The honest pitch
Don't pay $50k for the four sources above. Pay for what's hard to build: continuous discipline, deduplication across time, alerting that doesn't spam you with the same finding three times, and an auditable trail when the regulator asks how you knew. That's what we charge for, and it's not $50k — see our pricing.
Try Recon on your own domain
One Recon scan = subdomains via CT, IPs enriched with ASN/geo, public buckets, GitHub references, and any leaked secrets in code that mentions your domain. Free Basic scan first to verify ownership.
Run a scan