1) What this app does
Tom's Site Auditor crawls a website starting from a URL you provide, discovers pages via links, runs 25+ deterministic health checks, and generates a self-contained HTML report with scores, prioritized issues, and fix guidance.
- Missing or duplicate page titles
- Missing meta descriptions
- Missing or multiple H1 tags
- Broken internal links (4xx/5xx)
- Broken external links (optional)
- Missing image alt text
- Missing canonical tags
- Missing OpenGraph tags
- Noindex/nofollow detection
- Low word count pages
- Redirect chains
- And more...
- Not a cloud service or SaaS
- Not a keyword research tool
- Not a ranking predictor
- Not a vulnerability / pentest scanner
- Not dependent on external APIs
- No account required
2) System requirements
- Windows 10 or Windows 11 (64-bit)
- Internet connection (to crawl websites)
- Enough disk space for scan data (~1–50 MB per scan)
- 8 GB RAM for large scans (1,000+ pages)
- SSD for faster scan storage
- Modern multi-core CPU for responsiveness
3) First-time setup
- Extract the ZIP
Right-click the downloaded ZIP → Extract All. Do not run the app from inside the ZIP. - Keep the folder structure intact
The app creates a data folder next to the EXE for scan storage. Don't move files around after extraction. - Run the app
Double-click TomsSiteAuditor.exe. No installer or admin rights needed.
4) Quick start
- Enter a URL
Paste a website address (e.g. https://example.com) into the URL field on the Overview tab. - Set max pages
Start with a small number like 50–100 for your first scan to see how it works. - Click Start
The crawler will begin discovering pages, checking links, and collecting data. - Review results
Switch between the Pages, Issues, and Diagnostics tabs as the scan runs. When complete, export an HTML report.
5) Overview tab
The Overview tab is where you configure and launch scans. It shows the URL input, scan settings, and live status during a crawl.
- Start URL — the website to crawl
- Max pages — hard cap to prevent runaway crawls
- Crawl delay — time between requests (Normal/Slow)
- Deterministic mode — stable crawl order for comparison
- External link checking — on/off toggle
- Pages discovered vs. pages scanned
- Current URL being fetched
- Error/warning counts
- Log panel with crawl messages
6) Pages tab
Lists every page discovered during the crawl. Each row shows the URL, HTTP status code, page title, meta description presence, word count, and other extracted signals.
7) Issues tab
Shows all detected problems grouped by type and severity. Each issue includes:
- What the issue is — clear description
- Why it matters — impact on site health/SEO
- How to fix it — actionable guidance
- Affected URL(s) — which pages are affected
8) Diagnostics tab
Shows aggregate scan statistics including total pages scanned, failed pages, redirect counts, response code distribution, rate limiting events (429s), and scan timing.
9) Reports & exports
After a scan completes, you can export results in multiple formats:
- Self-contained file with inlined CSS
- Site health score and label
- "Fix These First" priority section
- Issues by type with fix guidance
- Full pages table
- Open in any browser, email to clients
- CSV — pages.csv and issues.csv for spreadsheets
- XML Sitemap — generated from discovered pages
- NDJSON — raw scan data for advanced use
10) Scan history & comparison
All scans are saved locally in the data/scans/ folder. You can:
- Browse and reload previous scans
- Compare two scans to see what changed
- Detect regressions (new issues that appeared since the last scan)
- Export reports from any saved scan
11) Advanced settings
Crawl delay presets
Controls how fast the crawler makes requests. Use "Slow" if you're getting rate-limited (429 errors) or if you want to be polite to smaller servers.
Deterministic mode
When enabled, the crawler processes URLs in a stable sorted order. This means repeated scans of the same site with the same settings will produce the same crawl order, making comparison results more meaningful.
External link checking
When enabled, the crawler validates outbound links to other domains (checking for broken external links). This increases scan time but gives a more complete picture.
Max pages
Sets a hard cap on how many pages to crawl. For large sites, start with 500–1000 and increase as needed. The scan will stop cleanly when the limit is reached and note it in the report.
URL ignore rules
Pattern-based rules to exclude certain URLs from crawling (e.g. admin paths, login pages, infinite calendars).
Robots.txt
The crawler can optionally respect robots.txt rules. This is configurable per scan.
TLS certificate validation
By default, the crawler allows connections to sites with expired or self-signed certificates (common for SEO tools that need to audit all sites). You can enable strict mode if you prefer to fail on bad certificates.
↑ Back to top12) Known limitations (current version)
- HTML parsing: The parser is heuristic/string-based, not a full browser DOM. Some edge cases with complex JavaScript-rendered content may produce false positives/negatives.
- JavaScript-heavy sites: The crawler fetches raw HTML and does not execute JavaScript. Single-page applications (SPAs) may not be fully crawlable.
- Memory usage: All pages and issues are stored in RAM during a scan. Very large sites (10,000+ pages) will use significant memory.
- External link checking: Adds scan time and is subject to network variability (timeouts, rate limiting by third-party servers).
- Deterministic mode: Guarantees stable crawl ordering, but cannot guarantee identical results if the site's content or server behavior changes between scans.
13) Troubleshooting & support
If you're running into problems, try the quick fixes below before submitting a support request.
- Close and reopen the app
- Re-extract the ZIP (corrupted extract can cause issues)
- Make sure you are not running the app from inside the ZIP
- Try scanning a different website to confirm if the issue is site-specific
- Check that you have an internet connection (the app needs to reach the target site)
Common issues
- Click More info → Run anyway
- Ensure the ZIP is extracted first
- If your antivirus quarantined files, restore them and add the app folder as an exception
- The target site may be rate-limiting you — increase the crawl delay
- Disable external link checking for faster scans
- Reduce max pages to start with a smaller scan
- Check the Diagnostics tab for 429 (rate limited) responses
- Some issues may be false positives due to JavaScript-rendered content
- Verify by opening the affected URL in a browser and viewing source
- If the issue is genuinely wrong, please report it
- Make sure the app has write access to its data folder
- Don't run the app from a read-only location
How to submit a support request (copy/paste template)
Please copy this template and fill it out:
Tom's Site Auditor Support Request 1) App Version: (Shown in the About section) 2) Windows Version: (Windows 10/11 + edition if known) 3) CPU + RAM: (Example: Ryzen 5, 16GB) 4) What URL were you scanning? 5) What settings were you using? (Max pages, crawl delay, deterministic mode, etc.) 6) What did you expect to happen? 7) What actually happened? (Error message? Crash? Wrong results? Hang?) 8) Steps to reproduce: 1. 2. 3. 9) Any screenshots or exported reports: (Attach if available)