Tom's Log Intelligence turns raw web server access logs into clear, actionable insight. Instead of scrolling through thousands of lines of log text, you get an organised breakdown of bot traffic, AI crawlers, security threats, download activity, and crawl coverage gaps.
The app parses Apache, Nginx, and IIS log files (including compressed .gz files from cPanel), classifies every request using a database of 83+ bot signatures, and presents the results across eight purpose-built tabs.
Everything runs locally on your machine. No data is sent anywhere. No subscription. No internet connection required.
Windows 7 or later (64-bit recommended). No installation needed — the app is a single portable EXE. Just copy the folder anywhere and run TomLogIntelligence.exe.
The following files should be in the same folder as the EXE:
| File | Purpose |
|---|---|
bot_signatures.json | Bot identification patterns (83 signatures). The app works without it using built-in defaults, but the JSON version is more complete and user-editable. |
threat_signatures.json | Security threat URL patterns (85 signatures). Same fallback behaviour — works without it, but the JSON is editable and more comprehensive. |
When you first open the app, no project is loaded. You have two options:
Option A — Create a project first: Go to File → New Project, choose a location and name for your .logdb file, then import logs into it.
Option B — Just import: Go to the Import tab and click Import Log File(s). If no project is open, the app will automatically create one based on the log filename. A file named tomdahne_com-ssl_log-Apr-2026.gz will create a project called tomdahne.com.logdb in a projects folder next to the EXE.
.log or .gz files directly onto the app window to start an import.Projects are saved as .logdb files. These are standard SQLite databases. You can copy them for backup, open them in any SQLite browser for custom queries, or share them with colleagues. To reopen a project, use File → Open Project or drag the .logdb file onto the app window.
| Format | Description |
|---|---|
| Apache Combined | The most common format, including cPanel logs with response time suffix |
| Apache Common | Simpler format without referrer and user-agent fields |
| Nginx | Same combined format as Apache |
| W3C Extended (IIS) | Space-delimited with a #Fields: header line |
| Auto-detect | Tries all formats automatically (default) |
Files can be plain text (.log, .txt) or gzip compressed (.gz). The app handles multi-stream gzip files, which is common with cPanel log rotation.
You can select multiple files in the import dialog. They will be imported one at a time in sequence, with progress shown for each file. The status bar shows the current filename and how many files remain.
The app tracks each imported file by its path, size, and last modified timestamp. If you try to import the same unchanged file again, it will be skipped with a "File already imported" message. If you modify the file (e.g., it grows from new log entries), it will be treated as a new import.
The Overview tab is your dashboard. It shows three main components:
A rules-based insight panel at the top that surfaces the most important conclusions from your data in plain English. Findings include:
| Finding | Triggers when |
|---|---|
| Bot traffic percentage | More than 40% of requests are from identified bots |
| Uncrawled SA pages | You've loaded a Site Auditor export and some pages haven't been visited by bots |
| AI crawler activity | Any AI bots were detected, showing count, pages accessed, and the most active crawler |
| Suspicious requests | Any threat URL patterns were matched (WordPress probes, config scans, etc.) |
| Top bandwidth consumer | Always shown — identifies which bot uses the most bandwidth |
| File downloads | Any successful downloads of .zip/.exe files were found |
If no findings trigger (e.g., very little data), the panel simply doesn't appear and the layout stays clean.
Seven metric cards showing total events, unique bots, unique URLs, errors, AI bot hits, total bandwidth, and download count.
A daily event trend line chart (last 90 days) on the left and a top-10 bots bar chart on the right. Bot bars are colour-coded by category: Search AI SEO Monitoring Unknown
A full list of every bot detected in your logs, with columns for name, category, known status, hit count, bandwidth, and last seen date. Click any column header to sort.
Use the text filter to search by bot name, or the category dropdown to show only search engines, AI bots, SEO tools, monitoring, social, or unknown bots.
Click any bot to see its crawled URLs in the lower panel, showing URL, hits, HTTP status, and the top IP address for each URL. This is useful for spotting patterns — for example, a bot that only hits your images, or one that focuses on admin pages.
Shows Yes if the bot matched a signature in bot_signatures.json, or No if it was identified heuristically from its user-agent string. This does not indicate DNS verification — it simply means the bot was recognised by name.
A focused view of AI crawler activity, with a summary panel showing total AI crawlers detected, total requests, bandwidth consumed, and the most active AI bot.
Each AI bot has a checkbox. Select the bots you want to block, click Generate robots.txt Rules, and the app produces ready-to-use User-agent / Disallow directives. Click Copy to Clipboard to paste them directly into your robots.txt file.
The Select All checkbox toggles all bots at once.
Click any AI bot to see which pages it crawled most, with hit counts and status codes. This helps you understand what AI crawlers are targeting — are they scraping your blog content, hitting your API endpoints, or downloading files?
Every URL found in your logs, with summary cards (total URLs, downloads, 200 OK, errors) and a sortable table showing hits, unique bots, status codes, top referrer, and average response time.
Tick the Downloads checkbox to show only URLs matching download file extensions (.zip, .exe, .msi, .dmg, .pkg) with a 200 status. This gives you a quick view of what's being downloaded and how often.
Shows the most common referrer for each URL (excluding empty/dash values). This tells you how bots discovered each page — via external backlinks, your sitemap, or direct access.
Average response time for each URL, sourced from cPanel's response time field. Values are displayed in microseconds (μs), milliseconds (ms), or seconds (s) depending on magnitude. This helps identify slow pages that may be causing crawler timeouts.
All 4xx and 5xx responses grouped by URL and status code. Columns show the URL, status code, hit count, and which bots triggered the error. Use the text filter to search by URL.
Common things to look for: 404s on pages you've moved (set up redirects), 403s on resources bots shouldn't access (expected), and 500s that indicate server problems (investigate).
Click Export CSV to save the error list for further analysis or to share with your hosting provider.
Compares your log data against a Tom's Site Auditor crawl export to find gaps between what bots visit and what's actually on your site.
Run a crawl in Tom's Site Auditor and export the pages list as CSV. In Log Intelligence, click Load SA Export CSV, pick the file, then click Run Comparison.
The comparison produces two lists:
Orphaned URLs (left panel) — URLs that bots are crawling but which don't appear in your Site Auditor crawl. These might be old pages, external link targets, attack probes, or URLs generated by CMS plugins. High hit counts on orphaned URLs suggest bots are wasting resources on pages you may want to block or redirect.
Uncrawled pages (right panel) — Pages that Site Auditor found in your site structure but no bot has visited yet. These might be new content that hasn't been discovered, pages blocked by robots.txt, or pages buried too deep in your site's internal link structure.
Both lists can be exported to CSV using the buttons below each panel.
The comparison normalises trailing slashes automatically. /about and /about/ are treated as the same URL, preventing false results when log entries and SA exports use different conventions.
Shows your import history — every log file you've imported into this project, with the filename, number of events imported, date range covered, and when the import happened.
The progress bar and status label show real-time feedback during imports. When importing multiple files, the status shows which file is being processed and how many remain.
Scans your log data for suspicious URL patterns and presents them in a structured, actionable format.
Five cards showing suspicious URL count, total threat hits, and breakdowns for WordPress probes, config exposure, and injection/XSS attempts.
Directly below the cards, a highlighted panel shows the single IP address responsible for the most suspicious requests. This is the first IP to consider blocking in your firewall or .htaccess rules.
| Category | What it detects |
|---|---|
| WordPress Probes | /wp-admin, /wp-login.php, /xmlrpc.php, /wp-config, and other WordPress-specific paths |
| Config Exposure | /.env, /.git, /web.config, .bak files, composer.json, credentials files |
| Admin Panels | /phpmyadmin, /cpanel, /adminer, /administrator, database admin paths |
| Path Traversal | ../ sequences, /etc/passwd, /proc/self — attempts to read system files |
| Injection/XSS | SQL injection (UNION SELECT, 1=1), script tags, javascript: URIs, event handlers |
| Backdoor Probes | /shell.php, /c99.php, /webshell — known web shell filenames |
| Scanner Activity | /actuator, /jenkins, /phpinfo, /cgi-bin — automated vulnerability scanners |
Three filters work together: text filter (by URL), category dropdown, and IP filter. Type a partial IP address to see only threats from that source — useful for investigating whether a specific IP is probing your site.
Each threat URL shows the IP address that hit it most often. This helps you build targeted firewall rules for specific attack patterns.
Go to Export → Generate HTML Report to create a self-contained, single-file HTML report that you can open in any browser, email to a client, or archive.
The report includes:
| Section | Contents |
|---|---|
| Overview | Summary metric cards matching the Overview tab |
| Key Findings | The same rules-based insight sentences from the Overview tab, presented as a highlighted panel |
| Bots | Full bot table with category badges, sortable columns, and text filter |
| URLs | URL table with hits, unique bots, status codes, and top referrer |
| Errors | Error table with status codes and responsible bots |
All tables are sortable (click headers) and filterable (text box above each table). Sections can be collapsed or expanded. The report uses a sticky navigation bar for quick jumping between sections.
CSV exports are available from the Export menu and from individual tabs:
| Export | Source |
|---|---|
| Bots to CSV | Export menu — bot name, category, known status, hits, bandwidth, last seen |
| URLs to CSV | Export menu — URL, hits, unique bots, status codes |
| Errors to CSV | Errors tab Export button — URL, status, hits, bots |
| All Events to CSV | Export menu — raw event data (can be large) |
| Threats to CSV | Security tab Export button — URL, threat type, status, hits, sources, top IP |
| Orphans / Uncrawled | Cross-Ref tab Export buttons — separate CSVs for each list |
All CSVs include a UTF-8 BOM for correct display in Excel.
Edit bot_signatures.json in the same folder as the EXE. Each bot entry has:
{
"pattern": "MyCustomBot",
"name": "My Custom Bot",
"category": "seo_tool",
"owner": "My Company",
"verify_domain": "crawl.mycompany.com"
}
The pattern field is matched as a case-insensitive substring against the user-agent string. Longer patterns are matched first automatically, so "Googlebot-Image" will always match before "Googlebot".
Valid categories: search_engine, ai_bot, seo_tool, monitoring, social. Anything unrecognised is classified as unknown.
The verify_domain field is reserved for future bot verification via reverse DNS lookup.
Edit threat_signatures.json to add custom threat patterns. Each entry has:
{
"pattern": "/my-admin-panel",
"category": "admin",
"description": "Custom admin panel probe"
}
The pattern is matched as a case-insensitive substring against each URL, with percent-encoded characters (like %2F) decoded before matching. Valid categories: wordpress, config, admin, traversal, injection, backdoor, scanner.
wordpress entries from threat_signatures.json.If either JSON file is missing, corrupted, or empty, the app falls back to a built-in default set of signatures. This means the app always works — the JSON files are optional enhancements, not requirements.
In cPanel, go to Metrics → Raw Access and download your access logs. They'll be .gz files. You can import them directly without unzipping.
No. The app tracks each imported file by its path, size, and modification date. If you try to import the same unchanged file again, it will be skipped. If the file has been modified (e.g., new entries appended), it will be treated as a new import.
The app handles files up to 2 GB (uncompressed). However, large files use significant RAM during import because the entire file is decompressed into memory before parsing. For files over 100 MB compressed, make sure you have at least 4 GB of free RAM. If you run into issues, split the log file into smaller chunks before importing.
The app identifies bots using substring matching against a signature database. If a bot's user-agent string doesn't match any known pattern, but contains bot-like keywords (bot, crawler, spider, etc.), it's classified as unknown with a name extracted from its user-agent string. You can add custom signatures to bot_signatures.json to classify these properly.
The Key Findings panel only appears when specific thresholds are met (e.g., bot traffic exceeds 40%, AI crawlers are present, threats are detected). If your dataset is very small or very clean, no findings may trigger. This is by design — no findings means nothing noteworthy to report.
The Security tab shows suspicious URL patterns in your logs. These are requests that happened — not vulnerabilities that were exploited. Common actions:
For WordPress probes on a non-WordPress site, consider blocking the top offender IPs in your .htaccess or firewall. For config file probes (/.env, /.git), verify those files are not actually accessible on your server. For injection attempts, check that your application properly sanitises input. For persistent scanners, report the IP to your hosting provider's abuse team.
Orphaned URLs are pages bots crawled that weren't in your Site Auditor export. This can include: URLs from before a site redesign, image/CSS/JS asset URLs that SA doesn't list, URLs generated by JavaScript that SA's crawler didn't execute, or legitimate pages that SA missed due to crawl depth limits. Not all orphaned URLs are problems — they're just worth reviewing.
Edit the JSON files in a text editor, save, and restart the app (or open a new project). Changes take effect when the Security tab or bot classifier loads the file. No recompilation needed.
Yes. Nginx typically uses the same combined log format as Apache and works out of the box. For IIS, export your logs in W3C Extended format (the default). The auto-detect parser will handle both.
Project files are SQLite databases. They grow with each import. You can open them in a SQLite browser and run VACUUM to reclaim space, but generally the file size is proportional to your log data and is not a problem. If you want to start fresh, simply create a new project.
Yes. The .logdb file is a standard SQLite database. You can open it in DB Browser for SQLite, DBeaver, or any SQLite-compatible tool to run custom queries against your data.