Tom's Log Intelligence

User Guide — Full Reference

Version 1.2

1. What Is Log Intelligence?

Tom's Log Intelligence turns raw web server access logs into clear, actionable insight. Instead of scrolling through thousands of lines of log text, you get an organised breakdown of bot traffic, AI crawlers, security threats, download activity, and crawl coverage gaps.

The app parses Apache, Nginx, and IIS log files (including compressed .gz files from cPanel), classifies every request using a database of 83+ bot signatures, and presents the results across eight purpose-built tabs.

Everything runs locally on your machine. No data is sent anywhere. No subscription. No internet connection required.

2. Requirements

Windows 7 or later (64-bit recommended). No installation needed — the app is a single portable EXE. Just copy the folder anywhere and run TomLogIntelligence.exe.

The following files should be in the same folder as the EXE:

File	Purpose
`bot_signatures.json`	Bot identification patterns (83 signatures). The app works without it using built-in defaults, but the JSON version is more complete and user-editable.
`threat_signatures.json`	Security threat URL patterns (85 signatures). Same fallback behaviour — works without it, but the JSON is editable and more comprehensive.

3. Getting Started

First run

When you first open the app, no project is loaded. You have two options:

Option A — Create a project first: Go to File → New Project, choose a location and name for your .logdb file, then import logs into it.

Option B — Just import: Go to the Import tab and click Import Log File(s). If no project is open, the app will automatically create one based on the log filename. A file named tomdahne_com-ssl_log-Apr-2026.gz will create a project called tomdahne.com.logdb in a projects folder next to the EXE.

Tip You can also drag and drop .log or .gz files directly onto the app window to start an import.

Project files

Projects are saved as .logdb files. These are standard SQLite databases. You can copy them for backup, open them in any SQLite browser for custom queries, or share them with colleagues. To reopen a project, use File → Open Project or drag the .logdb file onto the app window.

4. Importing Log Files

Supported formats

Format	Description
Apache Combined	The most common format, including cPanel logs with response time suffix
Apache Common	Simpler format without referrer and user-agent fields
Nginx	Same combined format as Apache
W3C Extended (IIS)	Space-delimited with a `#Fields:` header line
Auto-detect	Tries all formats automatically (default)

Files can be plain text (.log, .txt) or gzip compressed (.gz). The app handles multi-stream gzip files, which is common with cPanel log rotation.

Multi-file import

You can select multiple files in the import dialog. They will be imported one at a time in sequence, with progress shown for each file. The status bar shows the current filename and how many files remain.

Duplicate detection

The app tracks each imported file by its path, size, and last modified timestamp. If you try to import the same unchanged file again, it will be skipped with a "File already imported" message. If you modify the file (e.g., it grows from new log entries), it will be treated as a new import.

Note Very large log files (hundreds of MB) will use significant RAM during import because the file is decompressed fully into memory before parsing. For typical cPanel logs (10–50 MB compressed), this is not a problem.

5. Overview Tab

The Overview tab is your dashboard. It shows three main components:

Key Findings panel

A rules-based insight panel at the top that surfaces the most important conclusions from your data in plain English. Findings include:

Finding	Triggers when
Bot traffic percentage	More than 40% of requests are from identified bots
Uncrawled SA pages	You've loaded a Site Auditor export and some pages haven't been visited by bots
AI crawler activity	Any AI bots were detected, showing count, pages accessed, and the most active crawler
Suspicious requests	Any threat URL patterns were matched (WordPress probes, config scans, etc.)
Top bandwidth consumer	Always shown — identifies which bot uses the most bandwidth
File downloads	Any successful downloads of .zip/.exe files were found

If no findings trigger (e.g., very little data), the panel simply doesn't appear and the layout stays clean.

Summary cards

Seven metric cards showing total events, unique bots, unique URLs, errors, AI bot hits, total bandwidth, and download count.

Charts

A daily event trend line chart (last 90 days) on the left and a top-10 bots bar chart on the right. Bot bars are colour-coded by category: Search AI SEO Monitoring Unknown

6. Bots Tab

A full list of every bot detected in your logs, with columns for name, category, known status, hit count, bandwidth, and last seen date. Click any column header to sort.

Filtering

Use the text filter to search by bot name, or the category dropdown to show only search engines, AI bots, SEO tools, monitoring, social, or unknown bots.

Bot drill-down

Click any bot to see its crawled URLs in the lower panel, showing URL, hits, HTTP status, and the top IP address for each URL. This is useful for spotting patterns — for example, a bot that only hits your images, or one that focuses on admin pages.

"Known" column

Shows Yes if the bot matched a signature in bot_signatures.json, or No if it was identified heuristically from its user-agent string. This does not indicate DNS verification — it simply means the bot was recognised by name.

7. AI Bots Tab

A focused view of AI crawler activity, with a summary panel showing total AI crawlers detected, total requests, bandwidth consumed, and the most active AI bot.

robots.txt generation

Each AI bot has a checkbox. Select the bots you want to block, click Generate robots.txt Rules, and the app produces ready-to-use User-agent / Disallow directives. Click Copy to Clipboard to paste them directly into your robots.txt file.

The Select All checkbox toggles all bots at once.

URL drill-down

Click any AI bot to see which pages it crawled most, with hit counts and status codes. This helps you understand what AI crawlers are targeting — are they scraping your blog content, hitting your API endpoints, or downloading files?

8. URLs Tab

Every URL found in your logs, with summary cards (total URLs, downloads, 200 OK, errors) and a sortable table showing hits, unique bots, status codes, top referrer, and average response time.

Downloads filter

Tick the Downloads checkbox to show only URLs matching download file extensions (.zip, .exe, .msi, .dmg, .pkg) with a 200 status. This gives you a quick view of what's being downloaded and how often.

Top Referrer column

Shows the most common referrer for each URL (excluding empty/dash values). This tells you how bots discovered each page — via external backlinks, your sitemap, or direct access.

Avg RT column

Average response time for each URL, sourced from cPanel's response time field. Values are displayed in microseconds (μs), milliseconds (ms), or seconds (s) depending on magnitude. This helps identify slow pages that may be causing crawler timeouts.

9. Errors Tab

All 4xx and 5xx responses grouped by URL and status code. Columns show the URL, status code, hit count, and which bots triggered the error. Use the text filter to search by URL.

Common things to look for: 404s on pages you've moved (set up redirects), 403s on resources bots shouldn't access (expected), and 500s that indicate server problems (investigate).

Click Export CSV to save the error list for further analysis or to share with your hosting provider.

10. Cross-Ref Tab

Compares your log data against a Tom's Site Auditor crawl export to find gaps between what bots visit and what's actually on your site.

How to use

Run a crawl in Tom's Site Auditor and export the pages list as CSV. In Log Intelligence, click Load SA Export CSV, pick the file, then click Run Comparison.

Understanding the results

The comparison produces two lists:

Orphaned URLs (left panel) — URLs that bots are crawling but which don't appear in your Site Auditor crawl. These might be old pages, external link targets, attack probes, or URLs generated by CMS plugins. High hit counts on orphaned URLs suggest bots are wasting resources on pages you may want to block or redirect.

Uncrawled pages (right panel) — Pages that Site Auditor found in your site structure but no bot has visited yet. These might be new content that hasn't been discovered, pages blocked by robots.txt, or pages buried too deep in your site's internal link structure.

Tip If you see "0 uncrawled pages" and a large number of orphaned URLs, your site structure is well-linked but bots are also finding a lot of pages outside it. Consider whether those orphaned URLs need redirects or robots.txt blocks.

Both lists can be exported to CSV using the buttons below each panel.

Trailing slash handling

The comparison normalises trailing slashes automatically. /about and /about/ are treated as the same URL, preventing false results when log entries and SA exports use different conventions.

11. Import Tab

Shows your import history — every log file you've imported into this project, with the filename, number of events imported, date range covered, and when the import happened.

The progress bar and status label show real-time feedback during imports. When importing multiple files, the status shows which file is being processed and how many remain.

12. Security Tab

Scans your log data for suspicious URL patterns and presents them in a structured, actionable format.

Summary cards

Five cards showing suspicious URL count, total threat hits, and breakdowns for WordPress probes, config exposure, and injection/XSS attempts.

Top offender panel

Directly below the cards, a highlighted panel shows the single IP address responsible for the most suspicious requests. This is the first IP to consider blocking in your firewall or .htaccess rules.

Threat categories

Category	What it detects
WordPress Probes	/wp-admin, /wp-login.php, /xmlrpc.php, /wp-config, and other WordPress-specific paths
Config Exposure	/.env, /.git, /web.config, .bak files, composer.json, credentials files
Admin Panels	/phpmyadmin, /cpanel, /adminer, /administrator, database admin paths
Path Traversal	../ sequences, /etc/passwd, /proc/self — attempts to read system files
Injection/XSS	SQL injection (UNION SELECT, 1=1), script tags, javascript: URIs, event handlers
Backdoor Probes	/shell.php, /c99.php, /webshell — known web shell filenames
Scanner Activity	/actuator, /jenkins, /phpinfo, /cgi-bin — automated vulnerability scanners

Filtering

Three filters work together: text filter (by URL), category dropdown, and IP filter. Type a partial IP address to see only threats from that source — useful for investigating whether a specific IP is probing your site.

Top IP column

Each threat URL shows the IP address that hit it most often. This helps you build targeted firewall rules for specific attack patterns.

Note The app detects and reports threats — it does not block them. Use the information to update your firewall rules, .htaccess, or hosting provider's security settings.

13. HTML Reports

Go to Export → Generate HTML Report to create a self-contained, single-file HTML report that you can open in any browser, email to a client, or archive.

The report includes:

Section	Contents
Overview	Summary metric cards matching the Overview tab
Key Findings	The same rules-based insight sentences from the Overview tab, presented as a highlighted panel
Bots	Full bot table with category badges, sortable columns, and text filter
URLs	URL table with hits, unique bots, status codes, and top referrer
Errors	Error table with status codes and responsible bots

All tables are sortable (click headers) and filterable (text box above each table). Sections can be collapsed or expanded. The report uses a sticky navigation bar for quick jumping between sections.

14. CSV Exports

CSV exports are available from the Export menu and from individual tabs:

Export	Source
Bots to CSV	Export menu — bot name, category, known status, hits, bandwidth, last seen
URLs to CSV	Export menu — URL, hits, unique bots, status codes
Errors to CSV	Errors tab Export button — URL, status, hits, bots
All Events to CSV	Export menu — raw event data (can be large)
Threats to CSV	Security tab Export button — URL, threat type, status, hits, sources, top IP
Orphans / Uncrawled	Cross-Ref tab Export buttons — separate CSVs for each list

All CSVs include a UTF-8 BOM for correct display in Excel.

15. Customising Signatures

Bot signatures

Edit bot_signatures.json in the same folder as the EXE. Each bot entry has:

{
  "pattern": "MyCustomBot",
  "name": "My Custom Bot",
  "category": "seo_tool",
  "owner": "My Company",
  "verify_domain": "crawl.mycompany.com"
}

The pattern field is matched as a case-insensitive substring against the user-agent string. Longer patterns are matched first automatically, so "Googlebot-Image" will always match before "Googlebot".

Valid categories: search_engine, ai_bot, seo_tool, monitoring, social. Anything unrecognised is classified as unknown.

The verify_domain field is reserved for future bot verification via reverse DNS lookup.

Threat signatures

Edit threat_signatures.json to add custom threat patterns. Each entry has:

{
  "pattern": "/my-admin-panel",
  "category": "admin",
  "description": "Custom admin panel probe"
}

The pattern is matched as a case-insensitive substring against each URL, with percent-encoded characters (like %2F) decoded before matching. Valid categories: wordpress, config, admin, traversal, injection, backdoor, scanner.

Tip If you run WordPress and want to suppress the WordPress probe detections (since those URLs are legitimate on your site), remove the wordpress entries from threat_signatures.json.

Fallback behaviour

If either JSON file is missing, corrupted, or empty, the app falls back to a built-in default set of signatures. This means the app always works — the JSON files are optional enhancements, not requirements.

16. FAQ & Troubleshooting

Where do I get my log files?

In cPanel, go to Metrics → Raw Access and download your access logs. They'll be .gz files. You can import them directly without unzipping.

Can I import the same log file twice?

No. The app tracks each imported file by its path, size, and modification date. If you try to import the same unchanged file again, it will be skipped. If the file has been modified (e.g., new entries appended), it will be treated as a new import.

My log file is very large. Will it work?

The app handles files up to 2 GB (uncompressed). However, large files use significant RAM during import because the entire file is decompressed into memory before parsing. For files over 100 MB compressed, make sure you have at least 4 GB of free RAM. If you run into issues, split the log file into smaller chunks before importing.

Why are some bots showing as "Unknown"?

The app identifies bots using substring matching against a signature database. If a bot's user-agent string doesn't match any known pattern, but contains bot-like keywords (bot, crawler, spider, etc.), it's classified as unknown with a name extracted from its user-agent string. You can add custom signatures to bot_signatures.json to classify these properly.

The Overview shows no Key Findings. Why?

The Key Findings panel only appears when specific thresholds are met (e.g., bot traffic exceeds 40%, AI crawlers are present, threats are detected). If your dataset is very small or very clean, no findings may trigger. This is by design — no findings means nothing noteworthy to report.

Security threats detected — what should I do?

The Security tab shows suspicious URL patterns in your logs. These are requests that happened — not vulnerabilities that were exploited. Common actions:

For WordPress probes on a non-WordPress site, consider blocking the top offender IPs in your .htaccess or firewall. For config file probes (/.env, /.git), verify those files are not actually accessible on your server. For injection attempts, check that your application properly sanitises input. For persistent scanners, report the IP to your hosting provider's abuse team.

Cross-Ref shows orphaned URLs that look normal. Why?

Orphaned URLs are pages bots crawled that weren't in your Site Auditor export. This can include: URLs from before a site redesign, image/CSS/JS asset URLs that SA doesn't list, URLs generated by JavaScript that SA's crawler didn't execute, or legitimate pages that SA missed due to crawl depth limits. Not all orphaned URLs are problems — they're just worth reviewing.

How do I update bot or threat signatures?

Edit the JSON files in a text editor, save, and restart the app (or open a new project). Changes take effect when the Security tab or bot classifier loads the file. No recompilation needed.

Can I use this with Nginx or IIS logs?

Yes. Nginx typically uses the same combined log format as Apache and works out of the box. For IIS, export your logs in W3C Extended format (the default). The auto-detect parser will handle both.

The .logdb file is getting large. Can I shrink it?

Project files are SQLite databases. They grow with each import. You can open them in a SQLite browser and run VACUUM to reclaim space, but generally the file size is proportional to your log data and is not a problem. If you want to start fresh, simply create a new project.

Can I open .logdb files in other tools?

Yes. The .logdb file is a standard SQLite database. You can open it in DB Browser for SQLite, DBeaver, or any SQLite-compatible tool to run custom queries against your data.