Overview
The Keyword Miner is a full keyword research and content planning system built around real crawl data. Instead of guessing what to target, you extract keywords directly from live pages — your own site and your competitors' — then filter, cluster, and assign them to specific content goals.
Each domain in Site Auditor has its own isolated SQLite database. Keyword data, block lists, clusters, and tracking records are all stored per-domain. Scanning a competitor never contaminates your own site's data.
Before you start
Installed and licensed on your machine.
Entered in the app settings before fetching volume data.
Start with your own site before moving to competitors.
For the final analysis step — ChatGPT, Claude, Grok, or similar.
Configure and Run the Scan
The keyword miner runs as part of the standard crawl pipeline. You enable it through the Custom Scan options before starting.
- Enter the URL you want to scan in the main scan field.
- Click Custom Scan to open the scan configuration options.
- Check Keyword Analysis in the custom scan options. This enables the miner during the crawl.
- Set a page limit if you want to cap the crawl — useful for large sites or initial exploratory scans.
- Click Start Scan. The app crawls the site and runs keyword extraction in parallel.
The Alphabetical Sweep
After the scan completes, go to the Keywords tab. You'll find the raw keyword list — everything the miner extracted. Before anything else, clean this list using the Alphabetical Sweep.
The technique
- Click the keyword column header to sort alphabetically.
- Press Ctrl+A to select all keywords.
- Scroll down the list visually. For every keyword worth keeping, click to deselect it. Work fast — you're looking for obvious keepers, not making final decisions.
- At the bottom, everything still selected is junk. Press Delete to remove it. The app will ask whether to Delete & Block (prevents reappearing on future scans) or Delete Only (removes now but allows rediscovery). For the initial sweep, choose Delete & Block — you want the block list to grow.
Why alphabetical? Similar junk clusters together. All broken URL fragments, stop words, navigation text, and generic noise appear grouped — your eye moves quickly once it recognises a pattern. Keepers stand out against the noise.
How the block list works
When you choose Delete & Block, the keyword is permanently added to that domain's block list. On all future scans of the same domain, blocked keywords are automatically suppressed — they never appear in the keyword tab again.
If you choose Delete Only, the keyword is removed from the current list but stays unblocked. It can reappear on the next scan if the crawler finds it again. Use this when you're unsure — you can always block it later if it keeps showing up.
The block list is the compounding advantage. By the third or fourth crawl of the same site, the noise floor has dropped significantly and the list is almost entirely signal.
Auto-Refine — validate keywords and discover new ones
For large keyword lists, the Auto-Refine button validates "Found" keywords against Google and DuckDuckGo autocomplete APIs. Keywords that return no autocomplete suggestions are blocked and removed (noise). Keywords that survive validation are marked with a ✓ in the Found In column, and the autocomplete suggestions themselves are collected into the Research panel for you to review and save as new keywords.
- Click Auto-Refine on the Keywords toolbar.
- Choose which sources to check against (Google + DuckDuckGo recommended).
- The app processes up to 50 keywords per batch to keep the results manageable. For larger lists, run Auto-Refine multiple times.
- Keywords with no suggestions are automatically blocked and deleted.
- Keywords with suggestions appear in the Research panel for you to review and optionally save.
Get Volume for Selected Keywords
Once the sweep is done, you have a list of candidates. Now fetch volume, CPC, competition, and trend data from Keywords Everywhere.
- Do a quick review of what's left. Remove any obvious junk you missed in the sweep.
- Select the keywords you want volume data for. Only select keywords you genuinely intend to evaluate — each one costs API credits.
- Click Get Volume. The app fetches data for selected keywords only.
Understanding the columns
| Column | What it tells you |
|---|---|
| Volume | Average monthly US searches. Above 0 means real people search for this term. |
| CPC | What advertisers pay per click. Above $0.50 signals commercial intent — someone is making money from this keyword. |
| Competition | 0–1 score of advertiser competition. Higher confirms strong commercial value, not just volume. |
| Trend | % change comparing last 3 months to prior 3 months. Positive = growing topic. Negative = proceed with caution. |
| Prominence | How strongly this keyword appears across your site. High prominence = you already have content for it. |
| Count | How many times it appears across all crawled pages. |
| Pages | How many pages the keyword was found on. |
| State | Found (unreviewed) · Kept (worth monitoring) · Tracked (actively tracking position). |
Research, Curate, and Mark Keepers
Now you have real data. Do a second pass — this time making deliberate decisions about each keyword's value, not just removing noise.
Delete these (use Delete & Block)
- Volume = 0, CPC = $0.00, no trend data — zero commercial or informational value
- Your own brand and product names you'll always rank #1 for without effort
- Generic terms with no connection to your products or audience
- Navigational fragments and UI text that slipped through the first sweep
Mark as Kept — your working keywords
- Volume above 0 with a positive or stable trend
- CPC above $0.50 — advertisers pay for these clicks, which means commercial intent exists
- Low prominence on your site — you don't have content for this yet (opportunity)
- Keywords describing problems your software solves
- Keywords your competitor ranks for that you have no page for
When you mark a keyword as Kept, the row changes colour in the list — making it visually distinct from unreviewed keywords. On re-scans, kept keywords are immediately identifiable, so subsequent sweeps only require evaluating new unknowns.
Build Clusters and Assign Keywords
A cluster connects keywords to a specific purpose — a product, article, guide, or service. Clusters turn a flat list of keywords into an organised content plan.
Start with a domain cluster
Create a cluster for the domain you scanned. This anchors keywords to their source — useful when scanning multiple competitor sites and you need to know where a keyword came from.
Create product and content clusters
- An existing product you sell (e.g. Site Auditor, TomsBGRemover)
- A product or feature you're planning to build
- An article or guide you want to write
- A service offering or landing page
Assign keywords and set their type
| Keyword Type | When to use it |
|---|---|
| Primary | The main keyword a page is built around. One primary keyword per page target. |
| Supporting | Variations, related terms, and long-tail phrases that reinforce the primary. Multiple per page is fine. |
Scan Competitor Sites
Once you've processed your own site, scan competitors using the exact same workflow. The goal is to find keywords they rank for that you don't yet have content for.
- Enter the competitor's domain and run a Custom Scan with Keyword Analysis enabled.
- Apply the Alphabetical Sweep — looking for keywords relevant to your audience, not theirs.
- Fetch volume for candidates that look promising.
- Mark keepers and assign them to your own product or article clusters.
Because each domain has its own database, their block list trains separately. Junk on a competitor's site doesn't pollute your block list, and vice versa.
What to look for in competitor scans
- Keywords they use prominently that you have no content for (high prominence, absent from your site)
- High-CPC terms you're not targeting — commercial keywords they've already validated
- Growing trend keywords where you could enter early
- Long-tail variations of terms you already rank for — extend your coverage
The AI Handoff
When you've processed enough domains and built a solid set of kept and clustered keywords, export the data as CSV from the Keywords tab, then feed it into your AI tool using the analysis prompt.
The prompt instructs the AI to identify content opportunities, surface keywords you already cover but could strengthen, flag low-value keywords for permanent deletion, and produce a prioritised monthly action list of specific articles to write or pages to improve.
The full analysis prompt is included in the appendix at the bottom of this guide.
Position Tracking
Auto-tracking every keyword across multiple search engines would consume API credits continuously. Instead, Site Auditor v3 uses a deliberate manual tracking model — you check the keywords you've consciously decided are worth monitoring.
The 5-engine SERP check
- In the Keywords tab, find a keyword you want to track.
- Double-click the keyword. The app opens search results in a new browser tab across Google, Bing, Yahoo, DuckDuckGo, and Brave.
- Scan page 1 and page 2 to find your ranking position.
- Right-click the keyword → Update Position. Enter your position (1–100) or tick "Not Ranking" for each engine.
- Click Save & Next — the app saves your entries, jumps to the next tracked keyword, opens its search results, and reopens the position dialog. Work through all your tracked keywords in one sitting.
When you publish new content targeting a keyword, manually add the page to the tracking system. The app monitors it from the date you add it — giving you a clear before/after record of content impact.
Quick Reference
The complete process at a glance:
| # | Action | Key Point |
|---|---|---|
| 01 | Configure scan | Enable Keyword Analysis in Custom Scan options |
| 02 | Alphabetical Sweep + Auto-Refine | Sweep removes visual noise; Auto-Refine validates 50 at a time via autocomplete |
| 03 | Delete & Block vs Delete Only | Block trains the filter for future scans; Delete Only for keywords you're unsure about |
| 04 | Get Volume | Select kept candidates only — API credits cost money |
| 05 | Curate keepers | Marked rows change colour and survive re-scan sweeps |
| 06 | Build clusters | Assign keywords to products, articles, or guides |
| 07 | Scan competitors | Per-domain databases never cross-contaminate |
| 08 | Export CSV + AI prompt | Edit the business context section before submitting |
| 09 | Track positions | Manual 5-engine SERP check; Save & Next cycles through all tracked keywords |
Tips for Getting the Most Value
- The first scan of any domain is the most labour-intensive. Subsequent scans are progressively faster as the block list grows.
- Scan your own site before competitors — it calibrates your eye for what's signal vs noise in your niche.
- Don't skip the delete step. Choose Delete & Block for obvious junk — the block list compounds over time. Use Delete Only when you're unsure.
- Auto-Refine is most effective after a manual sweep. Remove the worst noise by eye first, then let Auto-Refine validate the borderline terms against autocomplete.
- A keyword with zero volume but high CPC is worth investigating — it may have real commercial value in a narrow niche.
- Competitor scans are most useful when the competitor ranks for terms you're not targeting — look for prominence gaps, not just volume.
- Update the AI prompt's business context regularly. What you need to prioritise changes as your products and traffic evolve.
- When content goes live, add the page to tracking immediately. You want a baseline position before the page ages.
Appendix — AI Analysis Prompt
Copy this prompt, paste your CSV data where indicated, update the business context section, then submit to your AI tool of choice.