Keyword Miner Feature

The Complete
Keyword Workflow

From first crawl to published content plan. A practical end-to-end process for finding, filtering, clustering, and acting on keyword data from your own site and competitors.

Overview

The Keyword Miner is a full keyword research and content planning system built around real crawl data. Instead of guessing what to target, you extract keywords directly from live pages — your own site and your competitors' — then filter, cluster, and assign them to specific content goals.

Each domain in Site Auditor has its own isolated SQLite database. Keyword data, block lists, clusters, and tracking records are all stored per-domain. Scanning a competitor never contaminates your own site's data.

Before you start

⚙️
Site Auditor v3

Installed and licensed on your machine.

🔑
Keywords Everywhere API key

Entered in the app settings before fetching volume data.

🌐
A domain to scan

Start with your own site before moving to competitors.

🤖
An AI tool

For the final analysis step — ChatGPT, Claude, Grok, or similar.


01
Step One

Configure and Run the Scan

The keyword miner runs as part of the standard crawl pipeline. You enable it through the Custom Scan options before starting.

  1. Enter the URL you want to scan in the main scan field.
  2. Click Custom Scan to open the scan configuration options.
  3. Check Keyword Analysis in the custom scan options. This enables the miner during the crawl.
  4. Set a page limit if you want to cap the crawl — useful for large sites or initial exploratory scans.
  5. Click Start Scan. The app crawls the site and runs keyword extraction in parallel.
💡
Pro TipFor your first scan of a new domain, run without a page limit to get complete coverage. For large competitor sites, 50–100 pages is usually sufficient to surface the most prominent keywords.
ℹ️
No API credits used during crawlKeywords Everywhere credits are only consumed when you explicitly click Get Volume. The crawl itself extracts keywords from page content at no cost.

02
Step Two

The Alphabetical Sweep

After the scan completes, go to the Keywords tab. You'll find the raw keyword list — everything the miner extracted. Before anything else, clean this list using the Alphabetical Sweep.

The technique

  1. Click the keyword column header to sort alphabetically.
  2. Press Ctrl+A to select all keywords.
  3. Scroll down the list visually. For every keyword worth keeping, click to deselect it. Work fast — you're looking for obvious keepers, not making final decisions.
  4. At the bottom, everything still selected is junk. Press Delete to remove it. The app will ask whether to Delete & Block (prevents reappearing on future scans) or Delete Only (removes now but allows rediscovery). For the initial sweep, choose Delete & Block — you want the block list to grow.

Why alphabetical? Similar junk clusters together. All broken URL fragments, stop words, navigation text, and generic noise appear grouped — your eye moves quickly once it recognises a pattern. Keepers stand out against the noise.

What to Deselect (Keep)Keywords directly related to your products, terms describing problems your audience has, competitor heading/title keywords, recognisable search queries, and anything with obvious commercial or informational intent.

How the block list works

When you choose Delete & Block, the keyword is permanently added to that domain's block list. On all future scans of the same domain, blocked keywords are automatically suppressed — they never appear in the keyword tab again.

If you choose Delete Only, the keyword is removed from the current list but stays unblocked. It can reappear on the next scan if the crawler finds it again. Use this when you're unsure — you can always block it later if it keeps showing up.

The block list is the compounding advantage. By the third or fourth crawl of the same site, the noise floor has dropped significantly and the list is almost entirely signal.

⚠️
Always Delete, Never IgnoreUsers who skip the delete step and just scroll past junk lose the compounding benefit of the block list. They face the same cleanup work every scan. Delete & Block now, save time forever.

Auto-Refine — validate keywords and discover new ones

For large keyword lists, the Auto-Refine button validates "Found" keywords against Google and DuckDuckGo autocomplete APIs. Keywords that return no autocomplete suggestions are blocked and removed (noise). Keywords that survive validation are marked with a ✓ in the Found In column, and the autocomplete suggestions themselves are collected into the Research panel for you to review and save as new keywords.

  1. Click Auto-Refine on the Keywords toolbar.
  2. Choose which sources to check against (Google + DuckDuckGo recommended).
  3. The app processes up to 50 keywords per batch to keep the results manageable. For larger lists, run Auto-Refine multiple times.
  4. Keywords with no suggestions are automatically blocked and deleted.
  5. Keywords with suggestions appear in the Research panel for you to review and optionally save.
ℹ️
50 at a TimeAuto-Refine processes keywords in batches of 50 to prevent flooding the Research panel with thousands of suggestions. If more remain, the completion message tells you how many — just run Auto-Refine again for the next batch.
💡
Best approach: combine both methodsDo a quick Alphabetical Sweep first to remove the most obvious junk (broken fragments, navigation text), then run Auto-Refine on what's left. The sweep handles visual noise; Auto-Refine handles terms that look plausible but have no real search presence.

03
Step Three

Get Volume for Selected Keywords

Once the sweep is done, you have a list of candidates. Now fetch volume, CPC, competition, and trend data from Keywords Everywhere.

  1. Do a quick review of what's left. Remove any obvious junk you missed in the sweep.
  2. Select the keywords you want volume data for. Only select keywords you genuinely intend to evaluate — each one costs API credits.
  3. Click Get Volume. The app fetches data for selected keywords only.
ℹ️
Selection RequiredThe Get Volume button is only active when keywords are selected. This is intentional — it prevents accidentally fetching the entire list and burning credits on junk.

Understanding the columns

ColumnWhat it tells you
VolumeAverage monthly US searches. Above 0 means real people search for this term.
CPCWhat advertisers pay per click. Above $0.50 signals commercial intent — someone is making money from this keyword.
Competition0–1 score of advertiser competition. Higher confirms strong commercial value, not just volume.
Trend% change comparing last 3 months to prior 3 months. Positive = growing topic. Negative = proceed with caution.
ProminenceHow strongly this keyword appears across your site. High prominence = you already have content for it.
CountHow many times it appears across all crawled pages.
PagesHow many pages the keyword was found on.
StateFound (unreviewed) · Kept (worth monitoring) · Tracked (actively tracking position).

04
Step Four

Research, Curate, and Mark Keepers

Now you have real data. Do a second pass — this time making deliberate decisions about each keyword's value, not just removing noise.

Delete these (use Delete & Block)

  • Volume = 0, CPC = $0.00, no trend data — zero commercial or informational value
  • Your own brand and product names you'll always rank #1 for without effort
  • Generic terms with no connection to your products or audience
  • Navigational fragments and UI text that slipped through the first sweep
💡
When to use Delete OnlyUse Delete Only for keywords you're not sure about yet. They'll disappear from the current list but can reappear on the next scan. If they keep coming back and you keep skipping them, that's a signal to block them permanently.

Mark as Kept — your working keywords

  • Volume above 0 with a positive or stable trend
  • CPC above $0.50 — advertisers pay for these clicks, which means commercial intent exists
  • Low prominence on your site — you don't have content for this yet (opportunity)
  • Keywords describing problems your software solves
  • Keywords your competitor ranks for that you have no page for

When you mark a keyword as Kept, the row changes colour in the list — making it visually distinct from unreviewed keywords. On re-scans, kept keywords are immediately identifiable, so subsequent sweeps only require evaluating new unknowns.

💡
TipIf a keyword is borderline, mark it Kept and decide later. You can always delete it. It's harder to recover a keyword you deleted too hastily.

05
Step Five

Build Clusters and Assign Keywords

A cluster connects keywords to a specific purpose — a product, article, guide, or service. Clusters turn a flat list of keywords into an organised content plan.

Start with a domain cluster

Create a cluster for the domain you scanned. This anchors keywords to their source — useful when scanning multiple competitor sites and you need to know where a keyword came from.

Create product and content clusters

  • An existing product you sell (e.g. Site Auditor, TomsBGRemover)
  • A product or feature you're planning to build
  • An article or guide you want to write
  • A service offering or landing page

Assign keywords and set their type

Keyword TypeWhen to use it
PrimaryThe main keyword a page is built around. One primary keyword per page target.
SupportingVariations, related terms, and long-tail phrases that reinforce the primary. Multiple per page is fine.
💡
Pro TipA keyword can support multiple clusters if it's genuinely relevant to more than one product or article. Don't force exclusivity if the keyword has natural range.

06
Step Six

Scan Competitor Sites

Once you've processed your own site, scan competitors using the exact same workflow. The goal is to find keywords they rank for that you don't yet have content for.

  1. Enter the competitor's domain and run a Custom Scan with Keyword Analysis enabled.
  2. Apply the Alphabetical Sweep — looking for keywords relevant to your audience, not theirs.
  3. Fetch volume for candidates that look promising.
  4. Mark keepers and assign them to your own product or article clusters.

Because each domain has its own database, their block list trains separately. Junk on a competitor's site doesn't pollute your block list, and vice versa.

What to look for in competitor scans

  • Keywords they use prominently that you have no content for (high prominence, absent from your site)
  • High-CPC terms you're not targeting — commercial keywords they've already validated
  • Growing trend keywords where you could enter early
  • Long-tail variations of terms you already rank for — extend your coverage

07
Step Seven

The AI Handoff

When you've processed enough domains and built a solid set of kept and clustered keywords, export the data as CSV from the Keywords tab, then feed it into your AI tool using the analysis prompt.

The prompt instructs the AI to identify content opportunities, surface keywords you already cover but could strengthen, flag low-value keywords for permanent deletion, and produce a prioritised monthly action list of specific articles to write or pages to improve.

✏️
Edit the Business Context Section FirstThe prompt contains a context block describing your products and audience. Update it before each session. This single step dramatically improves recommendation quality — the AI cannot make good suggestions without knowing your actual situation.
💡
Useful context to includeYour products, target audience, whether you're in growth mode or conversion mode, and any content you already have live. Consider adding: "I have zero paying customers currently — prioritise conversion-intent keywords over pure volume plays."

The full analysis prompt is included in the appendix at the bottom of this guide.


08
Step Eight

Position Tracking

Auto-tracking every keyword across multiple search engines would consume API credits continuously. Instead, Site Auditor v3 uses a deliberate manual tracking model — you check the keywords you've consciously decided are worth monitoring.

The 5-engine SERP check

  1. In the Keywords tab, find a keyword you want to track.
  2. Double-click the keyword. The app opens search results in a new browser tab across Google, Bing, Yahoo, DuckDuckGo, and Brave.
  3. Scan page 1 and page 2 to find your ranking position.
  4. Right-click the keyword → Update Position. Enter your position (1–100) or tick "Not Ranking" for each engine.
  5. Click Save & Next — the app saves your entries, jumps to the next tracked keyword, opens its search results, and reopens the position dialog. Work through all your tracked keywords in one sitting.

When you publish new content targeting a keyword, manually add the page to the tracking system. The app monitors it from the date you add it — giving you a clear before/after record of content impact.

ℹ️
Save & Next flowYou don't need to close and reopen the position dialog for each keyword. Save & Next automatically cycles through all tracked keywords — opening browser tabs, scrolling the list, and pre-filling the dialog with the most recent data. When you reach the last tracked keyword, the app tells you there are no more to check.
ℹ️
Manual = IntentionalManually checking forces prioritisation. You only track keywords you've actively decided are worth watching — which means your tracking data is meaningful signal, not a dump of everything the tool found.

Quick Reference

The complete process at a glance:

#ActionKey Point
01Configure scanEnable Keyword Analysis in Custom Scan options
02Alphabetical Sweep + Auto-RefineSweep removes visual noise; Auto-Refine validates 50 at a time via autocomplete
03Delete & Block vs Delete OnlyBlock trains the filter for future scans; Delete Only for keywords you're unsure about
04Get VolumeSelect kept candidates only — API credits cost money
05Curate keepersMarked rows change colour and survive re-scan sweeps
06Build clustersAssign keywords to products, articles, or guides
07Scan competitorsPer-domain databases never cross-contaminate
08Export CSV + AI promptEdit the business context section before submitting
09Track positionsManual 5-engine SERP check; Save & Next cycles through all tracked keywords

Tips for Getting the Most Value

  • The first scan of any domain is the most labour-intensive. Subsequent scans are progressively faster as the block list grows.
  • Scan your own site before competitors — it calibrates your eye for what's signal vs noise in your niche.
  • Don't skip the delete step. Choose Delete & Block for obvious junk — the block list compounds over time. Use Delete Only when you're unsure.
  • Auto-Refine is most effective after a manual sweep. Remove the worst noise by eye first, then let Auto-Refine validate the borderline terms against autocomplete.
  • A keyword with zero volume but high CPC is worth investigating — it may have real commercial value in a narrow niche.
  • Competitor scans are most useful when the competitor ranks for terms you're not targeting — look for prominence gaps, not just volume.
  • Update the AI prompt's business context regularly. What you need to prioritise changes as your products and traffic evolve.
  • When content goes live, add the page to tracking immediately. You want a baseline position before the page ages.

Appendix — AI Analysis Prompt

Copy this prompt, paste your CSV data where indicated, update the business context section, then submit to your AI tool of choice.

keyword-analysis-prompt.md
# Keyword Analysis Strategy Prompt # Drop this prompt into ChatGPT, Claude, or Grok along with your exported keyword CSV. I have a CSV export from my SEO tool containing keyword data for my website. Each keyword has the following columns: - Volume: Average monthly search volume (US market) - CPC: Cost per click — what advertisers pay for this term in Google Ads - Competition: 0 to 1 score — how many advertisers bid on this term - Trend: % change comparing last 3 months vs the prior 3 months (positive = growing) - Prominence: How strongly this keyword appears across my site - Count: How many times this keyword appears across all pages - Pages: How many pages this keyword was found on - State: Found / Kept / Tracked Here is my data: [PASTE YOUR CSV DATA HERE] ## 1. Identify My Best Content Opportunities Which keywords should I write new content around? Rank by opportunity, considering: - Volume above 0 - CPC above $0.50 (commercial intent) - Trend is positive or stable - Low prominence / few pages (I don't have a strong page for it yet) For each opportunity, suggest a specific article title and angle. ## 2. Identify Keywords to Strengthen Which keywords do I already have content for (high prominence, multiple pages) but could rank better? Suggest specific page improvements. ## 3. Identify Keywords to Block/Delete Which keywords have zero value and should be permanently removed? - Volume = 0 AND CPC = $0.00 AND no trend data - Brand terms I'll always rank #1 for - Generic terms unrelated to my products ## 4. Flag Declining Keywords Which keywords have negative trends I should monitor? - How steep is the decline? - Worth continuing to invest in this topic? - Should I pivot the content angle? ## 5. Monthly Priority List Top 5 actions I should take this month, in order of impact. Be specific — tell me exactly what to write, improve, or delete. ## Context About My Business [EDIT THIS SECTION TO MATCH YOUR SITUATION] I am a solo indie software developer. My products are: - Tom's Site Auditor — a desktop SEO crawler and auditing tool for Windows - TomsBGRemover — a background removal app - IndexNow Submitter — a free tool for submitting URLs to search engines - Directory Manager — a tool for managing software directory submissions My website is tomdahne.com. I sell one-time purchase desktop software with no subscriptions. My target audience is freelance web designers, small business owners doing their own SEO, and indie developers. Keep your analysis practical and actionable. I'm a one-person operation.