Deep dive · demand mining in practice

Mining "already-validated demand" from hosting platforms' default subdomains

Someone used a paid Similarweb account to sweep the default subdomains of 23 site-building platforms (vercel.app, netlify.app, pages.dev, github.io…) all at once, and ended up with a "demand radar map" of roughly 77,000 landing pages, 47,000 sites, and 31.07 million visits. This deep dive takes that methodology apart and lays it out clearly: at its core it's an extremely concrete, engineered version of this handbook's "Niche discovery" and "Demand validation" chapters — you don't invent demand, you pick up demand that other people have already validated for you. At the end I add my own independent fact-check of the key conclusions, plus three traps you absolutely have to watch for.

◆

Where this deep dive comes from, and how to read the numbers The methodology comes from the "discover new demand from outbound domains / sites that already have traffic" approach championed by Gefei in the SEO / go-global site-building scene; the raw data and analysis come from a hands-on write-up based on a paid Similarweb export. Every traffic figure in this piece is a Similarweb estimate (landing-page-visit basis, with <50 approximated as 50), and a domain's "unregistered" status only reflects the snapshot at the time (2026-05-11) — before you actually buy, you must re-check registration, trademarks, the SERP, and legal risk. The numbers are for judging whether demand exists and how strong it is, not for precise valuation.

1. The core insight: the smart move isn't "what do I want to build" but "what are users already looking for"

Most people are still picking projects on gut feel: AI tools are hot today, so they build an AI tool; directory sites are making money tomorrow, so they build a directory site — yanked around by the feed. This method flips the question around:

Don't ask "is there demand in this market" — go straight for the sites that already have real traffic but have terrible product and SEO.
A site whose owner didn't even bother to buy a domain, did no SEO, and shipped a crude page — yet still pulls a few thousand to a few hundred thousand visits — tells you the demand is rock-solid: users really are searching, really clicking, really using it.

Now when you redo it with a better domain, better pages, better SEO, and a better experience, you're not betting your life from zero — you're fighting an upgrade battle on a slot that's already been validated. This is exactly what this handbook keeps hammering: technology was never the bottleneck; distribution and "building the right thing" are.

2. Why focus on default subdomains like vercel.app / netlify.app / pages.dev / github.io

Mature sites bind their own brand domain. But huge numbers of indie developers, students, AI vibe coders, and one-off project authors deploy on a platform and then just take the default subdomain out into the world as-is:

xxx.vercel.app xxx.netlify.app xxx.pages.dev xxx.github.io xxx.lovable.app xxx.replit.app

These default subdomains have one very valuable property: they usually represent early, temporary, AI-rapidly-generated, un-branded products. In other words — a lot of these sites weren't built by pros, yet they already have traffic. That's a "demand mine" the professional SEO teams haven't picked clean yet.

3. The full picture: 23 platforms, 77,000 landing pages, 31.07 million visits

This sweep covered 23 platform domains, roughly 77,000 landing-page records, corresponding to about 47,590 platform sites, with total estimated landing-page traffic of about 31.07 million. Layered by traffic:

Visit threshold	Sites	What it means
≥ 500	5,928	Real demand that's at least observable; the first-pass filter
≥ 5,000	704	Demand has taken shape; worth manual due diligence
≥ 10,000	336	Strong demand, but watch for gray-area / brand terms
≥ 50,000	57	High traffic, but the risk is most concentrated here too

Looking at total traffic by platform, the differences are huge — and you can't just stare at the totals, because every platform has its own "personality":

Platform	Landing-page traffic (est.)	Positioning & how to read it
github.io	≈ 18.838M	Mature demand library: lots of old projects, open source, docs, game tools. Opportunity exists, but not necessarily new, and plenty of copyright-risky sites.
pages.dev	≈ 2.844M	New-site radar: high share of new sites, lots of fresh traffic.
netlify.app	≈ 1.816M	New-site radar: dense with small tools, movie/TV, education, games, calculators, and region-lookup sites.
herokuapp.com	≈ 1.023M	Skews old, mostly legacy apps.
web.app	≈ 1.013M	Firebase family, mixed bag.
vercel.app	≈ 0.978M	Growth radar: this batch wasn't tagged "new," but the change field is almost entirely growth percentages, with many sites at 1,000%–5,000%+. Look at growth, not just "new."
lovable.app	≈ 0.449M	AI early-stage incubator: the Top 10 only account for about 4.9%, sites are extremely scattered, each one is small, but the variety of demand is rich.
onrender.com	≈ 0.341M	Skews backend / service-type.
replit.app	≈ 0.247M	Extreme-event type: the Top1 single site is about 177.9K and is fresh traffic. It can spike but is short-lived — good for event templates, bad as a long-term asset.

4. What actually matters isn't traffic, it's "signal" — four must-ask questions

When you look at this kind of data, "which site has the most traffic" is the most useless question. The real questions are these four — they matter ten times more than total traffic:

Must ask	What it tells you
Is this stable demand, or a one-off spike?	Oil field vs. fireworks. Event terms (a sports star's incident, an election result) flare up and die.
Is this a generic term, or a brand / piracy / borderline term?	Only generic tasks are your opportunity; for a term naming "a specific existing thing," grabbing the exact-match domain is usually just trolling for a fight.
Is the page terrible, yet it ranks for a lot of keywords?	Keyword count = number of search entry points covered. 5,000 visits / 500 keywords is often more worth doing than 50,000 visits / 1 hot term.
Can you build a better version in 1–3 days with AI?	Feasibility decides whether it's actually "yours" to take.

The most dangerous mistake: charging at anything with high traffic The highest-traffic sites in this batch are very tempting — Anna's Archive, Libgen, ROM sites, all kinds of movie/anime/gambling/adult/piracy/login-portal pages. That's not opportunity, it's a minefield. Their value isn't that you copy them, it's that they let you observe the structure of demand: a user searching for some anime site may signal strong content demand for that language / region / genre; a search for some exam score calculator says there's a big pool of anxious users in that region around that exam date who need a simple tool. Mediocre people see traffic and copy it; smart people see traffic and take the demand apart.

5. The five categories best suited to indie developers

Once you strip out the gray-area and brand minefields, the things with real long-term value are almost all in the "small tool" family. The common thread: clear demand, a small functional boundary, the user arrives and solves one specific problem, and there's no market to educate.

1. Generic tools (the cleanest, best to start with)

AVIF→JPG (avif jpg 変換, about 22,900), SVG Path Editor (about 8,100 / 385 keywords), App Privacy Policy Generator (about 6,200 / 288), PDF Dark Mode Converter (about 5,800), Steganography Decoder (about 10,600), MD5 Checksum (about 7,400), Base32 Decoder, Image to Spectrogram, LaTeX Viewer (about 11,500).

2. Calculators and planners (naturally interactive, long dwell time)

Enchantment Calculator (about 45,600 / 3,750 keywords), PSA Calculator (about 28,200 / 1,084), Response Sheet Marks Calculator (about 9,500), Vernier Caliper Simulator (about 7,000), plus all kinds of game build planners / damage calculators.

3. Gamer tools (steady traffic, but IP risk)

Team Builder, Cheat Sheet, Fusion Calculator, Predictor, Pixel Art Generator, League Planner… many were thrown together by a programmer — good functionality, but bad SEO / UI / multilingual support / mobile. You don't need to be technically more complex; just be easier to use, easier to find, and cover more long tail, and you'll eat the traffic.

4. Exam / education / region-specific lookups (highly seasonal, but they come back every year)

The JEE series, CGPA/Grade Calculator, Vietnam college-entrance countdown, SAT question banks, etc. Too small for big companies to bother with, but a hard need for students. The right move is to build an exam-tool template and clone pages by exam × year × region, turning "seasonal" into "cyclical."

5. Developer micro-tools and documentation explainers (the highest-quality users)

Readme Generator, Transformer Explainer, API Explorer, Cheat Sheet, Markdown Viewer, and the like. High user quality, good monetization (ads, paid templates, API, sponsorship, email list). But brand-term risk is high, so build descriptive pages rather than brand-name domains.

6. Quantifying "is it worth doing": the opportunity-score formula

The biggest problem with the raw candidate list is: domain available ≠ opportunity doable. So you can't just buy from highest traffic down — you need a risk-adjusted score. This method collapses the judgment into one formula:

Final opportunity score = demand strength × scalability × feasibility × monetization potential − risk penalty
Demand strength looks at traffic / keyword count / growth rate / multiple sources; scalability looks at whether you can spin out 20+ long-tail pages; feasibility looks at whether you can ship an MVP in 48 hours; monetization potential looks at ads / affiliate / templates / API / paid; the risk penalty looks at brand, copyright, adult, piracy, login impersonation, short-lived events, and medical/financial/legal misinformation.

"Your current list only completed step one — finding demand. It hasn't done step two — judging whether it's worth doing. Mediocre people see traffic and charge in; smart people first ask whether that traffic bites."

7. Risk red lines: which to drop outright, and which to "steal the demand but not buy the exact match"

Tier	Type	How to handle
Red / Tier C	Movies/TV, streaming, piracy, ROM, adult, borderline, login-impersonation portals, one-off event terms	Drop outright, no regrets. Reject anything with terms like login / movie / stream / torrent / rom / pirate / porn / iptv / youtube-to-mp3. "Your goal is $100K a day, not a cease-and-desist a day."
Yellow / Tier B	Brand/platform terms (github, netlify, openai, microsoft…), game IP (Minecraft, Pokémon, Genshin, Arknights…)	Real demand, but don't buy the exact-match domain. Build an "adjacent, generalized site" instead: repo downloader, agent sdk examples, rpg fusion calculator, pixel art generator. Stand on the user's task route, not under the brand's house number.
Green / Tier A	Generic tools, clean calculators, exam templates, developer tools	Diligence first; pick your first batch of experiments from here.

8. The Tier-A candidate list (scored on risk / demand / MVP difficulty / SEO expansion combined)

These "clean terms with 3,000–8,000 traffic" are often worth more than an 80K-traffic gray-area term — because they're clean, stable, and compound over the long run. Before you buy, re-verify registration, trademark, SERP, and competitors.

Candidate	Traffic (est.)	Keywords	Verdict
enchantmentcalculator.com	45,600	3,750	One of the prettiest signals on the whole list — big long tail, can be a standalone site; the domain doesn't spell out a game name, so relatively safe.
avifjpg.com	22,900	140	Simple to build, low risk, easy to localize; good as the basis for a whole image-format tool matrix, leading with browser-side (local processing, no upload).
warpgenerator.com	56,500	502	Generic name, strong tool quality; worth checking search intent before diligence.
xdeltapatcher.com	19,500	219	Clear demand, but be careful not to touch ROM downloads.
latexviewer.com	11,500	452	Academic/developer crossover, clean, can extend to Markdown/BibTeX/citation.
steganographydecoder.com	10,600	293	An entry point for a security/CTF tool matrix, pair with MD5/Base32/Exif/QR.
responsesheetmarkscalculator.com	9,500	—	Exam tool, 105% growth; build templated per-year pages.
svgpatheditor.com	8,100	385	Design/frontend users, a long-term asset.
md5checksum.com	7,400	236	An old but stable need; can fold into a decoder tool site.
appprivacypolicygenerator.com	6,200	288	High commercial value (developers/publishers pay), must add a "not legal advice" disclaimer.
pdfdarkmodeconverter.com	5,800	192	Concrete pain point (PDFs are harsh on the eyes at night); emphasize local processing as a privacy selling point.
imagetospectrogram.com	5,300	113	Niche but clear.

9. The real play: copy the demand, not the site — then rebuild the structure

Copying a site is a low-level move. The right order is: see why it has traffic → take apart its keyword structure → figure out what task it solves → find where it's done badly → rebuild it with better SEO/experience/localization.

For example, when you spot a PDF Dark Mode Converter micro-tool with traffic, don't just build the same single button — build out the entire "topic cluster":

PDF Dark Mode Converter Invert PDF Colors PDF Night Mode Reader PDF Contrast Enhancer Make PDF Easier to Read at Night How to Read PDFs in Dark Mode (tutorial) FAQ / privacy notice / related tools

This is putting programmatic SEO exactly where it counts: one tool page + 3 long-tail conversion pages + 3 tutorial pages + FAQ + privacy notice + related-tools internal links + localized versions. A single button is a demo; a tool site is an asset.

✦

What's actually scarce in the AI era Building a small tool stopped being scarce a long time ago. What's scarce is: you know which tool to build, how to name it, how to expand it, how to dodge the risks, and how to make both search engines and users trust you more. This isn't a business won on inspiration — it's won by "putting the dumb grunt work into the right slots."

10. An executable SOP (copy it straight)

Build a platform-domain pool: vercel.app, netlify.app, pages.dev, github.io, web.app, firebaseapp.com, herokuapp.com, onrender.com, railway.app, replit.app, lovable.app, bolt.host, amplifyapp.com, azurestaticapps.net, deno.dev, fly.dev…
Export + clean: use Similarweb's keyword / landing-page tools to export CSVs, keeping URL, visits, change, keyword count, and top keywords; parse K/M/<50; aggregate by host, and strip out the platform's own pages separately (e.g. netlify.com's login/docs/form pages will pollute candidates).
First pass: keep ≥500 visits; loosen up for new / high-growth sites, prioritizing change >100% / >500% / >1000%.
Risk filter: directly exclude adult / gambling / piracy / movies-TV / cracks / brand impersonation / login portals / financial phishing / medical misinformation / obvious infringement.
Tag the demand: tool / calculator / generator / converter / viewer / game / education / exam / developer / AI / region lookup / event spike / gray-area.
Look at keyword count: more keywords = more search entry points, which matters more than raw visits.
Open the competitor pages manually: check Title/H1, whether the feature is complete, mobile, load speed, whether there's content/FAQ/internal links/localization, and whether it's just a throwaway demo.
Generate candidate domains: don't mechanically slap on .com; avoid brand/trademark/project names, favor generic descriptive ones.
Build the MVP: one need, one core action — the user finishes the task in 10 seconds; no login/admin/membership/complex systems. Do the first batch browser-side only.
Add the SEO structure: Title/Description/H1/FAQ/How-to/Schema/Sitemap/robots/internal links/related tools/localization — all of it.
Launch and wire up Search Console: weekly, look at impression terms, click terms, and high-impression/low-rank terms, and use real search queries to drive the next batch of pages — don't expand pages on a hunch.
Re-sweep every month: the power of this method isn't a one-time dig, it's a continuous radar — rerun it every 30 days and keep the new and high-growth demand.

◆

Don't get greedy with the first batch — run only 3 experiments Buying too many domains manufactures "fake achievement" — nothing's launched, you're still at zero. I'd suggest exactly three in the first batch: ① an image/file conversion tool set (start from avifjpg); ② a developer/security micro-tool site (Steganography / MD5 / Base32 / LaTeX…); ③ an exam score calculator template. Give each project just 48 hours for the MVP, and ship with at least 10 pages (main tool + 3 long-tail + 3 tutorials + FAQ + privacy + related tools).

11. My independent fact-check: what's credible, and which three traps to watch for

This method sounds pretty ruthless, but as a research report it has to be placed against verifiable facts and known limitations:

✅ The methodology itself is real, checkable, and corroborated

"Gefei" is a publicly active SEO / AdSense practitioner in the go-global site-building scene, and his community's methodology can be summed up as a 40% mining demand / 20% building the product / 20% (should be 40%) promotion loop; his public courses really do include things like "discovering new demand and new products from outbound domains" and "analyzing high-traffic pages to mine the demand others are already making money on" — and this deep dive's subdomain-scanning method is precisely the engineered upgrade of that idea. The indie hacker community also broadly corroborates the phenomenon that "a lot of top-ranking sites have bad technical SEO; junk sites eat traffic anyway" — i.e., "a weak site still has traffic = demand is validated + there's room to outdo it."

Trap one: traffic ≠ paying demand. Cooler heads in the community warn: a lot of people treat "traffic" as "validation," but it's really just "attention." Whether search traffic converts into users willing to pay / willing to view ads is a separate assumption. This method holds best for the low-ticket, scale-driven "tool site + ads/affiliate" model; if what you want to build is a paid SaaS, subdomain traffic is only the first layer of signal — you still need to validate willingness to pay per this handbook's "Demand validation" chapter.

Trap two: Similarweb's subdomain granularity has a ceiling. Similarweb mostly gives estimates at the domain level; a small site like a single xxx.vercel.app often falls below its sample threshold, so long-tail estimates are noisy; add to that the raw basis of approximating <50 as 50, which systematically inflates the volume of "long-tail small sites." So these numbers are good for relative ranking and a "is there demand or not" judgment, not for precise traffic. Before you launch, always cross-check real search volume with Google Search Console / keyword tools.

Trap three: compliance and security are hard constraints, not suggestions. ① "Unregistered" is a one-day snapshot; before buying you must re-check registration + trademark + SERP + competitors; ② privacy-policy / legal tools must state "not legal advice, no substitute for a lawyer"; ③ for game IP / brand terms, build adjacent generalized sites — don't pile on logos, don't pose as official, don't do infringing downloads; ④ if you also batch-query with scripts, be sure to check for and rotate any API Key written into the script (the original author stepped right on the trap of exposing a Key in analyze.py).

12. How it connects to the rest of this handbook

If you're someone who "can write code but lacks marketing instinct," this subdomain-mining method fills in exactly that first stretch from idea to traffic:

Handbook chapter	The problem this method solves for it
Niche discovery	Instead of gut feel, use "junk sites that already have traffic" as a demand signal source to find niche opportunities at scale.
Demand validation	Traffic is itself a layer of validation (note: attention-level validation), turning "prove someone wants it before you write code" into a data pipeline.
The seven acquisition channels	Leads with programmatic SEO: topic clusters + long-tail pages + internal links + localized subdirectories.
Growth · Pricing	Use a tool matrix for product compounding, with Search Console data driving iteration.

Don't build what you think is cool — build what someone has already proven with a junk site. Demand isn't dreamed up; it's mined out of user behavior.
The earlier dataset tells you "where the ore is"; the candidate domain list tells you "what's mixed into the ore." Now the move isn't to keep digging, it's to start smelting — build 3 first, 48 hours each, and let Search Console tell you what the next page should be.

◆

Sources for this deep dive · Original methodology and hands-on write-up: the original newsletter post, "I spent a day sweeping hosting-platform subdomains with Similarweb…" · Corroboration of Gefei's methodology: Indie Hacker Tools — Gefei, HardHacker EP57 "Learning SEO and site-building with Gefei" · "weak sites rank anyway = opportunity" and "traffic = attention ≠ paying": Indie Hackers — Everything I Know About SEO · Similarweb domain-level data and subdomain-granularity limits: Similarweb Website Traffic Checker All traffic figures are third-party estimates, and all "revenue/growth" numbers are publicly claimed by the relevant parties — they don't mean you can reproduce them, and they're not investment or legal advice.