π Available URLs
| URL | Description |
|---|---|
https://yourdomain.com/page/?format=markdown | Markdown version of any page or post |
https://yourdomain.com/?format=markdown-sitemap | Overview of all available pages for AI agents |
https://yourdomain.com/llms.txt | Instructions and context for AI agents |
π·οΈ Link tag in header
If the Link tag in header option is enabled, the plugin inserts the following meta tags into the <head> of each page:
<link rel="alternate" type="text/markdown" href=".../llms.txt" title="Site Context" />
<link rel="alternate" type="text/markdown" href=".../?format=markdown" title="Page Context" />
<link rel="alternate" type="text/markdown" href=".../?format=markdown-sitemap" title="Markdown Sitemap" />
AI agents use this tag for autodiscovery β they automatically detect that a Markdown version of the page exists and use it for a better and more accurate understanding of the content.
π Access log colors
- Green β OpenAI / GPTBot
- Orange β Anthropic / ClaudeBot
- Blue β Google / Googlebot
- Purple β Bing / Microsoft
- Pink β Other AI bots (Perplexity, Cohere, Amazon SearchBot, ByteSpider…)
- Gray β Unknown visitor
π§ WordPress filters for developers (Overview)
| Filter | Description |
|---|---|
toret_markdown_post_frontmatter | Modification of the post’s YAML header before closing |
toret_markdown_post_output | Final Markdown of the page / product |
toret_markdown_term_output | Final Markdown of the taxonomy (category, tag) |
toret_markdown_archive_output | Final Markdown of the archive, homepage, shop |
toret_markdown_sitemap_pages_args | Arguments for get_pages() in the sitemap |
toret_markdown_sitemap_posts_args | Arguments for get_posts() in the sitemap |
toret_markdown_sitemap_products_args | Arguments for wc_get_products() in the sitemap |
toret_markdown_sitemap_products | Array of products after loading (object filtering) |
toret_markdown_sitemap_output | The entire final sitemap text |
π€ What the plugin does
Toret Markdown adds an alternative output in Markdown format to every WordPress page. This is primarily designed for AI language models and agents (ChatGPT, Claude, Perplexity…), which process Markdown text more accurately and efficiently than standard HTML.
The plugin generates three types of outputs:
- Page / post / product β
?format=markdown - Sitemap for AI β
?format=markdown-sitemap - llms.txt β instructions and context at the website root
π Quick start
1. Enable the plugin
In the WordPress admin, go to Toret plugins > AI Markdown (Settings tab) and make sure the Plugin is active toggle is checked.
2. Test the output
Add ?format=markdown to the end of any URL on your website, e.g.:https://yourdomain.com/page/?format=markdown
3. Setup llms.txt
In the plugin admin, go to the llms.txt tab and adjust the instructions for AI agents. The file will then be publicly available at https://yourdomain.com/llms.txt.
4. Check the sitemap
The AI sitemap is available at https://yourdomain.com/?format=markdown-sitemap. It contains an overview of all public pages with their descriptions.
βοΈ Settings description
Link tag in header
When enabled, the plugin inserts the <link rel="alternate" type="text/markdown"> tag into the <head> of every page. Modern AI agents recognize this tag and automatically fetch the Markdown version instead of HTML β without any action required on your part.
HTTP timeout
The plugin generates Markdown by internally fetching HTML pages and converting them. The timeout determines how long to wait for a response. For slow hosting or exceptionally large pages, we recommend increasing the value to 30β60 s.
Slug blacklist
One slug per line. Pages on the blacklist will not return Markdown output and will not appear in the sitemap. This typically includes technical pages without relevant content: cart, checkout, my-account, thank-you.
Access logging
The plugin logs every access to ?format=markdown and ?format=markdown-sitemap, including bot identification (GPTBot, ClaudeBot, Googlebot…). IP addresses are automatically anonymized. You can view the records in the Access log tab.
Max. log records
When the limit is exceeded, the oldest records are automatically deleted. The recommended value is 500β1000 for regular websites.
βοΈ Markdown Editor
How the cache works
The plugin generates Markdown automatically on every access. The result is saved in the cache (as post meta), so repeated requests are very fast and do not overload the server. The cache is automatically refreshed whenever a post is edited.
Manual editing
In the Editor tab, you can manually edit and save the Markdown of any page. Once saved, the cache is locked β the plugin will no longer automatically overwrite it. The lock can be removed at any time using the button directly in the editor.
Locked cache
Pages with a locked cache are marked with a π icon in the editor. This means the website will show AI agents exactly the content you manually set β the plugin will not overwrite it even upon subsequent post updates.
πΊοΈ AI Sitemap
What the sitemap contains
The sitemap is an overview of all public pages, posts, and products in Markdown format. For each item, it contains the title, URL, modification date, and a short description. AI agents use it to discover what exists on the website and then load details of specific pages.
Manual override
In the Sitemap tab, you can override the dynamically generated content with your own text and save it. Use the Restore dynamic sitemap button to easily revert to automatic generation.
Limits and ordering
In the Settings tab, you can set the maximum number of items and their sorting method for each content type (pages, posts, products). A value of -1 or 0 means unlimited.
π€ llms.txt
What it is for
llms.txt is a standardized file (functioning similarly to robots.txt) that tells AI agents how to properly interact with your website. We recommend writing here: who you are, what you offer, which pages are most important for AI, and any restrictions that apply.
Default content
Using the Restore default content button in the llms.txt tab, the plugin automatically generates basic content from your WordPress data (site name, description, contact, sitemap URL). You then just need to add to it or manually adjust it.
π Metabox on pages and products
Where it appears
The Toret Markdown metabox appears in the right column of the editor for every post, page, and product. It shows the current cache status (up to date / not generated / excluded) and the exact date of the last generation.
Generate / Regenerate
Use the Generate (or Regenerate) button to trigger immediate Markdown cache generation for the given page directly from the admin, without needing to visit it. Once generated, a Preview link will appear to check the output.
Exclude from Markdown
The page will not return any Markdown output β instead, AI will be served regular HTML. At the same time, the <link> tag will not be inserted into the header, and the page will not be included in the sitemap.
Lock cache
Prevents automatic cache regeneration when the page is saved. This is useful if you have manually edited the Markdown in the editor and don’t want the plugin to accidentally overwrite it. The lock can be unchecked or unlocked directly in the editor at any time.
Hide from sitemap
Although the page will not appear in ?format=markdown-sitemap, its Markdown version remains fully accessible via ?format=markdown. The difference from Exclude from Markdown is that the content still exists, it just doesn’t appear in the global overview for AI.
Link to editor
The Edit Markdown in editor link at the bottom of the metabox will take you directly to the central Editor tab with the current page preselected.
ποΈ Category settings
Where it appears
The Toret Markdown panel appears on the post category edit page (Posts β Categories). If WooCommerce is active, it works exactly the same for product categories (Products β Categories).
Exclude from Markdown
The category page will not return Markdown output and the <link> tag will not be inserted into its header. It works identically to the option of the same name in the individual post metabox.
Hide from sitemap
The category will not appear in ?format=markdown-sitemap, but its Markdown version remains accessible via ?format=markdown. There is also a direct Markdown Preview link on the category edit page for quickly viewing the output.
π WooCommerce
Products in the sitemap
If WooCommerce is active, the sitemap automatically includes products, along with their price, stock status, and category. In the settings, you can filter products by category or limit their total number.
Product frontmatter
The Markdown output of a product page features a smartly expanded YAML header with price (price), currency (currency), stock status (in_stock), and SKU. This allows AI agents to know the exact and current state of the product without needing to complexly parse the HTML.
Product categories
On the product category page in the admin (Products β Categories), you will find a metabox for optionally excluding the entire category from Markdown output and the sitemap.
ποΈ Plugin Architecture
md.php
The core of the entire plugin. It contains all constants, helper functions, the HTML β Markdown converter (toret_html_to_markdown()), and output generators for individual page types.
markdown-handler.php
Handles HTTP requests. It intercepts the ?format=markdown and ?format=markdown-sitemap parameters, determines the page type, and calls the appropriate generator. It also includes bot detection and access logging.
metabox.php
Handles the metabox in the post, page, and product editors. Registers the metabox for all public post types, manages AJAX cache generation, and handles term meta for categories.
admin.php
Secures the administration interface. Registers the Settings API, renders all tabs, and handles saving settings.
π Constants
| Constant | Value | Description |
|---|---|---|
TORET_MARKDOWN_CACHE_KEY | _toret_markdown_cache | Post meta key for saved Markdown |
TORET_MARKDOWN_META_EXCLUDE | _toret_markdown_exclude | Post meta β exclusion from Markdown output |
TORET_MARKDOWN_META_LOCK | _toret_markdown_lock | Post meta β automatic cache regeneration lock |
TORET_MARKDOWN_META_SITEMAP_EXCLUDE | _toret_markdown_sitemap_exclude | Post meta β hide from sitemap |
TORET_MARKDOWN_TERM_EXCLUDE | _toret_markdown_exclude | Term meta β exclude category from Markdown |
TORET_MARKDOWN_TERM_SITEMAP_EXCLUDE | _toret_markdown_sitemap_exclude | Term meta β hide category from sitemap |
TORET_MARKDOWN_LOG_OPTION | toret_markdown_access_log | Options key for access log |
TORET_MARKDOWN_LOG_MAX | 500 | Default max. number of log records |
πͺ WordPress Filters (Hooks)
toret_markdown_post_frontmatter
Allows modifying the YAML frontmatter of a post/page before the closing ---. Parameters: string $frontmatter, WP_Post $post.
add_filter( 'toret_markdown_post_frontmatter', function( $fm, $post ) {
$fm .= 'author: ' . get_the_author_meta( 'display_name', $post->post_author ) . "\n";
return $fm;
}, 10, 2 );
toret_markdown_post_output
The final Markdown of the post/page including frontmatter. Parameters: string $markdown, WP_Post $post.
add_filter( 'toret_markdown_post_output', function( $md, $post ) {
return $md . "\n\n---\n*Generated automatically.*\n";
}, 10, 2 );
toret_markdown_term_output
The final Markdown of the taxonomy page (category, tag). Parameters: string $markdown, WP_Term $term.
toret_markdown_archive_output
The final Markdown of the archive page (homepage, shop, archive, search). Parameters: string $markdown, string $type, string $title.
toret_markdown_sitemap_pages_args
Arguments passed to the get_pages() function when generating the sitemap. Parameter: array $args.
toret_markdown_sitemap_posts_args
Arguments passed to the get_posts() function when generating the sitemap. Parameter: array $args.
toret_markdown_sitemap_products_args
Arguments passed to the wc_get_products() function when generating the sitemap. Parameter: array $args.
toret_markdown_sitemap_products
Array of WooCommerce products after loading, before rendering them into the sitemap. Allows filtering or custom sorting of objects. Parameter: array $products.
toret_markdown_sitemap_output
The entire final text of the dynamically generated sitemap. Parameter: string $output.
πΎ Cache Logic
Storage
The generated Markdown is saved as post meta with the key _toret_markdown_cache along with a timestamp in _toret_markdown_generated_at. The cache is only used for posts/pages/products β for terms (categories) and archives, it is always generated dynamically.
Invalidation
The cache is automatically deleted upon saving or deleting a post (save_post, before_delete_post), unless a lock (_toret_markdown_lock) is set on it. Manual deletion of the entire cache for the whole site is available in the Settings tab.
Generation
The plugin generates Markdown by internally downloading (fetching) HTML pages via wp_remote_get() and parsing the result with its own DOM converter (toret_html_to_markdown()). A special parameter _tmcb={timestamp} is added to the URL to effectively bypass CDN caching (Cloudflare, etc.).
βοΈ HTML β Markdown Converter
Implementation
The plugin does not use any external library for conversion. The conversion is handled by native PHP DOMDocument combined with a recursive walker toret_dom_node_to_markdown(). The output finally goes through the cleaning function toret_cleanup_markdown().
Discarded tags
The content of <script>, <style>, <form>, <noscript>, <canvas>, and <svg> tags is completely ignored and is not transferred to the final output at all.
Complex lists
The plugin automatically detects “complex” <ul> elements (e.g., blog listings, product grids). Items longer than 100 characters or those containing headings are then rendered as clear sections separated by --- instead of standard bullet points.
Cloudflare email
The plugin recognizes Cloudflare email protection (/cdn-cgi/l/email-protection#β¦) and automatically decodes the real email address directly into readable output.
π AJAX Endpoints
| Action | Nonce | Description |
|---|---|---|
toret_markdown_generate | toret_markdown_ajax_generate | Generates or regenerates the Markdown cache for a given post_id. Requires the edit_posts administrator capability. |
toret_markdown_unlock_cache | toret_markdown_unlock_cache | Unlocks the locked post cache β deletes the _toret_markdown_lock meta. Requires the edit_post capability for the given post. |
π€ Bot Detection
Recognized agents
Detection is done by comparing the User-Agent browser header (case-insensitive). Recognized types and their colors in the log are:
- gpt β GPTBot, ChatGPT-User, OAI-SearchBot
- claude β ClaudeBot, Claude-User, Claude-SearchBot, Anthropic
- google β Googlebot, Google-Extended, Gemini, GoogleOther
- bing β Bingbot, BingPreview, MSNBot
- other_ai β PerplexityBot, Cohere-AI, YouBot, Diffbot, ByteSpider, AmazonBot and others
- unknown β unrecognized User-Agent
ποΈ Database β Saved Values
| Option / Meta key | Type | Description |
|---|---|---|
toret_markdown_options | option | All plugin settings as a serialized array |
toret_markdown_access_log | option | Access log as a JSON array (max. records according to settings) |
toret_markdown_sitemap_content | option | Manually overridden sitemap content (empty = dynamic generation) |
toret_markdown_llms_content | option | Content of the llms.txt file |
_toret_markdown_cache | post meta | Generated Markdown content of the post |
_toret_markdown_generated_at | post meta | Date and time of the last cache generation |
_toret_markdown_exclude | post/term meta | Exclusion from Markdown output |
_toret_markdown_lock | post meta | Automatic cache regeneration lock |
_toret_markdown_sitemap_exclude | post/term meta | Hide from sitemap |