Tom's Site Auditor – Instructions

Offline Website Scanner Windows 10/11

This guide explains how to use Tom's Site Auditor to scan websites, detect SEO issues, generate reports, compare scans over time, and troubleshoot common problems.

Offline-first: All processing stays on your PC.
Portable: Single EXE, no installer needed.
Deterministic: Repeatable scans for reliable comparisons.
25+ checks: SEO, links, tags, structure, and more.

1) What this app does

Tom's Site Auditor crawls a website starting from a URL you provide, discovers pages via links, runs 25+ deterministic health checks, and generates a self-contained HTML report with scores, prioritized issues, and fix guidance.

What it checks
  • Missing or duplicate page titles
  • Missing meta descriptions
  • Missing or multiple H1 tags
  • Broken internal links (4xx/5xx)
  • Broken external links (optional)
  • Missing image alt text
  • Missing canonical tags
  • Missing OpenGraph tags
  • Noindex/nofollow detection
  • Low word count pages
  • Redirect chains
  • And more...
What it is NOT
  • Not a cloud service or SaaS
  • Not a keyword research tool
  • Not a ranking predictor
  • Not a vulnerability / pentest scanner
  • Not dependent on external APIs
  • No account required
↑ Back to top

2) System requirements

Required
  • Windows 10 or Windows 11 (64-bit)
  • Internet connection (to crawl websites)
  • Enough disk space for scan data (~1–50 MB per scan)
Recommended
  • 8 GB RAM for large scans (1,000+ pages)
  • SSD for faster scan storage
  • Modern multi-core CPU for responsiveness
Note: The app itself is a single portable EXE with zero external dependencies. No .NET, no Java, no runtimes needed.
↑ Back to top

3) First-time setup

  1. Extract the ZIP
    Right-click the downloaded ZIP → Extract All. Do not run the app from inside the ZIP.
  2. Keep the folder structure intact
    The app creates a data folder next to the EXE for scan storage. Don't move files around after extraction.
  3. Run the app
    Double-click TomsSiteAuditor.exe. No installer or admin rights needed.
Windows SmartScreen: If you see "Windows protected your PC", click More infoRun anyway. This happens because the app is new and not yet widely distributed.
↑ Back to top

4) Quick start

  1. Enter a URL
    Paste a website address (e.g. https://example.com) into the URL field on the Overview tab.
  2. Set max pages
    Start with a small number like 50–100 for your first scan to see how it works.
  3. Click Start
    The crawler will begin discovering pages, checking links, and collecting data.
  4. Review results
    Switch between the Pages, Issues, and Diagnostics tabs as the scan runs. When complete, export an HTML report.
Tip: For your very first scan, try scanning your own website or a small site you're familiar with. This makes it easier to verify the results make sense.
↑ Back to top

5) Overview tab

The Overview tab is where you configure and launch scans. It shows the URL input, scan settings, and live status during a crawl.

Configuration
  • Start URL — the website to crawl
  • Max pages — hard cap to prevent runaway crawls
  • Crawl delay — time between requests (Normal/Slow)
  • Deterministic mode — stable crawl order for comparison
  • External link checking — on/off toggle
Live status
  • Pages discovered vs. pages scanned
  • Current URL being fetched
  • Error/warning counts
  • Log panel with crawl messages
↑ Back to top

6) Pages tab

Lists every page discovered during the crawl. Each row shows the URL, HTTP status code, page title, meta description presence, word count, and other extracted signals.

Tip: Click any URL in the list to open it in your default browser. The list uses virtual scrolling and can handle thousands of pages efficiently.
↑ Back to top

7) Issues tab

Shows all detected problems grouped by type and severity. Each issue includes:

  • What the issue is — clear description
  • Why it matters — impact on site health/SEO
  • How to fix it — actionable guidance
  • Affected URL(s) — which pages are affected
Prioritization: The "Fix These First" section highlights the highest-impact issues to address before anything else.
↑ Back to top

8) Diagnostics tab

Shows aggregate scan statistics including total pages scanned, failed pages, redirect counts, response code distribution, rate limiting events (429s), and scan timing.

Note: Diagnostics data is useful for understanding how the crawl went — for example, a high 429 count means the server was rate-limiting you. Try increasing the crawl delay.
↑ Back to top

9) Reports & exports

After a scan completes, you can export results in multiple formats:

HTML Report
  • Self-contained file with inlined CSS
  • Site health score and label
  • "Fix These First" priority section
  • Issues by type with fix guidance
  • Full pages table
  • Open in any browser, email to clients
Other exports
  • CSV — pages.csv and issues.csv for spreadsheets
  • XML Sitemap — generated from discovered pages
  • NDJSON — raw scan data for advanced use
Tip: The HTML report is designed to be shared with non-technical clients. The score, priority list, and plain-language guidance make it easy to understand without SEO expertise.
↑ Back to top

10) Scan history & comparison

All scans are saved locally in the data/scans/ folder. You can:

  • Browse and reload previous scans
  • Compare two scans to see what changed
  • Detect regressions (new issues that appeared since the last scan)
  • Export reports from any saved scan
Deterministic mode + scan comparison is the recommended workflow for ongoing monitoring. Run periodic scans with the same settings and compare results to catch regressions early.
↑ Back to top

11) Advanced settings

Crawl delay presets

Controls how fast the crawler makes requests. Use "Slow" if you're getting rate-limited (429 errors) or if you want to be polite to smaller servers.

Deterministic mode

When enabled, the crawler processes URLs in a stable sorted order. This means repeated scans of the same site with the same settings will produce the same crawl order, making comparison results more meaningful.

External link checking

When enabled, the crawler validates outbound links to other domains (checking for broken external links). This increases scan time but gives a more complete picture.

Max pages

Sets a hard cap on how many pages to crawl. For large sites, start with 500–1000 and increase as needed. The scan will stop cleanly when the limit is reached and note it in the report.

URL ignore rules

Pattern-based rules to exclude certain URLs from crawling (e.g. admin paths, login pages, infinite calendars).

Robots.txt

The crawler can optionally respect robots.txt rules. This is configurable per scan.

TLS certificate validation

By default, the crawler allows connections to sites with expired or self-signed certificates (common for SEO tools that need to audit all sites). You can enable strict mode if you prefer to fail on bad certificates.

↑ Back to top

12) Known limitations (current version)

  • HTML parsing: The parser is heuristic/string-based, not a full browser DOM. Some edge cases with complex JavaScript-rendered content may produce false positives/negatives.
  • JavaScript-heavy sites: The crawler fetches raw HTML and does not execute JavaScript. Single-page applications (SPAs) may not be fully crawlable.
  • Memory usage: All pages and issues are stored in RAM during a scan. Very large sites (10,000+ pages) will use significant memory.
  • External link checking: Adds scan time and is subject to network variability (timeouts, rate limiting by third-party servers).
  • Deterministic mode: Guarantees stable crawl ordering, but cannot guarantee identical results if the site's content or server behavior changes between scans.
↑ Back to top

13) Troubleshooting & support

If you're running into problems, try the quick fixes below before submitting a support request.

Before you report a bug, try these quick fixes:
  • Close and reopen the app
  • Re-extract the ZIP (corrupted extract can cause issues)
  • Make sure you are not running the app from inside the ZIP
  • Try scanning a different website to confirm if the issue is site-specific
  • Check that you have an internet connection (the app needs to reach the target site)

Common issues

Problem: "The app won't open" or "Windows protected your PC"
  • Click More infoRun anyway
  • Ensure the ZIP is extracted first
  • If your antivirus quarantined files, restore them and add the app folder as an exception
Problem: Scan seems stuck or very slow
  • The target site may be rate-limiting you — increase the crawl delay
  • Disable external link checking for faster scans
  • Reduce max pages to start with a smaller scan
  • Check the Diagnostics tab for 429 (rate limited) responses
Problem: Report shows unexpected issues
  • Some issues may be false positives due to JavaScript-rendered content
  • Verify by opening the affected URL in a browser and viewing source
  • If the issue is genuinely wrong, please report it
Problem: Settings don't persist between sessions
  • Make sure the app has write access to its data folder
  • Don't run the app from a read-only location

How to submit a support request (copy/paste template)

Please copy this template and fill it out:

Tom's Site Auditor Support Request 1) App Version: (Shown in the About section) 2) Windows Version: (Windows 10/11 + edition if known) 3) CPU + RAM: (Example: Ryzen 5, 16GB) 4) What URL were you scanning? 5) What settings were you using? (Max pages, crawl delay, deterministic mode, etc.) 6) What did you expect to happen? 7) What actually happened? (Error message? Crash? Wrong results? Hang?) 8) Steps to reproduce: 1. 2. 3. 9) Any screenshots or exported reports: (Attach if available)

Best possible bug report: App version + URL + settings used + exact steps + a screenshot of any error or unexpected behavior.
↑ Back to top