What it does

It audits the signals AI crawlers genuinely consume, scores where you stand, and generates a clean, reviewable package you can upload as-is.

AI-readiness score

A light crawl checks your site against the AI-discovery and crawler-control endpoints and gives you one clear percentage — so you know where you stand before you change anything.

✓

ADP endpoint audit

Probes the standard endpoints — robots.txt AI rules, sitemap.xml, ai-discovery.json, llms.txt and the rest — and tells you exactly which you already have, which are missing, and which are misconfigured.

⚙

robots.txt AI-bot rules

Generates explicit allow/deny rules for the AI crawlers that actually respect robots.txt — GPTBot, ClaudeBot, Google-Extended, PerplexityBot and more. Merges into an existing file additively, never overwriting your own blocks.

☷

Structured data & knowledge graph

Builds a schema.org structured-data starter and a knowledge-graph.json from what it finds on your pages — the markup AI engines actually parse when they read your HTML.

⊘

Training opt-out (tdmrep)

Declares your text-and-data-mining position in a standard tdmrep.json, tied to your training choice so robots.txt and tdmrep agree. Your stance, stated unambiguously.

☰

Sitemaps

Produces a clean sitemap.xml and an ai-sitemap.xml from the crawl, with per-URL last-modified dates — the discovery file AI crawlers fetch most after robots.txt.

≡

The llms.txt family — honestly

Generates llms.txt, llms-lite.txt, ai-discovery.json and ai-discovery.md too — included as forward-looking extras, with a plain note on what current adoption data actually shows.

▤

Page classification

Labels every page it finds — product, article, guide, utility, legal, or index — so the generated files describe your site accurately instead of treating every URL the same.

🔒

Private by design

The crawl runs from your machine and the package is written to a folder next to the EXE. No account, no telemetry, nothing uploaded. The only network traffic is the crawl of the site you point it at.

An honest word on llms.txt. Server-log evidence — including my own — shows the major AI crawlers reach sites through robots.txt, sitemaps, and your HTML, and rarely fetch the llms.txt-style files. This kit leads with the signals that work and includes the convention files as cheap, static insurance — not as a magic visibility lever. Read the data →

A walk through the kit

Three tabs take you from a URL to an upload-ready package.

ADP Status

The first tab after a scan. Every discovery and control file is marked found, missing, or blocked, alongside your AI-readiness score and a breakdown of the pages found by type — product, article, guide, utility, legal, or index.

Pages Found

Every page the crawl reached, with its title, classification, word count, and the schema.org types detected on it. Sort by any column to see how your site is structured and where the schema gaps are.

Generate

Confirm your details, set your AI training, citation and summarisation permissions, then generate. One click writes the full package and a zip — with a README.txt and an upload-map.txt telling you exactly where each file goes on your server.

How it works

Four steps from a cold URL to a reviewed, upload-ready set of files.

Scan

Point it at your site. A light crawl maps your pages and checks which discovery and control files already exist.

Score

Read your AI-readiness score and the file-by-file status. Now you know what's missing instead of guessing.

Configure

Confirm your details and set your training, citation and summarisation permissions — all on one screen.

Generate

One click produces the package and zip. Review the files, back up anything you're replacing, and upload.

What's in the package

Every file is plain text or JSON, safe to read, and yours to review before it goes anywhere near your server.

File	What it is
`robots.txt.NEW`	AI-crawler allow/deny rules — compare with your existing robots.txt and back up first
`sitemap.xml`	Standard sitemap from the crawl — review against any CMS/plugin sitemap first
`ai-sitemap.xml`	AI-oriented sitemap, ready to upload
`ai-discovery.json`	Machine-readable site summary and permission declarations
`ai-discovery.md`	Human-readable companion — review then upload
`knowledge-graph.json`	Structured-data starter — add relationships manually
`llms-lite.txt`	Curated content map for the llms.txt convention
`tdmrep.json`	Text-and-data-mining reservation — goes in `.well-known/`
`README.txt` & `upload-map.txt`	Deployment guide and a file-to-server-path reference

The technical bit

Built to the same rules as every other tool on this site: offline-first, zero dependencies, single portable EXE.

Platform

Windows 10 and 11, x64. Built with C++17 and the Win32 API. No MFC, no Qt, no frameworks, no .NET.

Storage

No database. Settings live in an INI file next to the EXE; the generated package is written to an output\ folder beside it.

Network

WinHTTP for the crawl and endpoint checks. No third-party HTTP libraries, no telemetry of any kind.

Rendering

Owner-drawn Win32 UI with a dark theme, DPI-aware. Consistent look without external UI toolkits.

Install

None. Unzip and run. Delete the folder to uninstall — nothing is written outside it except the files you choose to upload.

Licence

Free for personal and commercial use. Source not distributed. No warranties.

See your site the way AI crawlers do. Then fix it.

What it does

AI-readiness score

ADP endpoint audit

robots.txt AI-bot rules

Structured data & knowledge graph

Training opt-out (tdmrep)

Sitemaps

The llms.txt family — honestly

Page classification

Private by design

A walk through the kit

ADP Status

Pages Found

Generate

How it works

Scan

Score

Configure

Generate

What's in the package

The technical bit

Platform

Storage

Network

Rendering

Install

Licence

Download Tom's AI Discovery Kit