Toret AI Markdown plugin documentation

πŸ”— Available URLs

URLDescription
https://yourdomain.com/page/?format=markdownMarkdown version of any page or post
https://yourdomain.com/?format=markdown-sitemapOverview of all available pages for AI agents
https://yourdomain.com/llms.txtInstructions and context for AI agents

🏷️ Link tag in header

If the Link tag in header option is enabled, the plugin inserts the following meta tags into the <head> of each page:

<link rel="alternate" type="text/markdown" href=".../llms.txt" title="Site Context" />
<link rel="alternate" type="text/markdown" href=".../?format=markdown" title="Page Context" />
<link rel="alternate" type="text/markdown" href=".../?format=markdown-sitemap" title="Markdown Sitemap" />

AI agents use this tag for autodiscovery – they automatically detect that a Markdown version of the page exists and use it for a better and more accurate understanding of the content.

πŸ“Š Access log colors

  • Green – OpenAI / GPTBot
  • Orange – Anthropic / ClaudeBot
  • Blue – Google / Googlebot
  • Purple – Bing / Microsoft
  • Pink – Other AI bots (Perplexity, Cohere, Amazon SearchBot, ByteSpider…)
  • Gray – Unknown visitor

πŸ”§ WordPress filters for developers (Overview)

FilterDescription
toret_markdown_post_frontmatterModification of the post’s YAML header before closing
toret_markdown_post_outputFinal Markdown of the page / product
toret_markdown_term_outputFinal Markdown of the taxonomy (category, tag)
toret_markdown_archive_outputFinal Markdown of the archive, homepage, shop
toret_markdown_sitemap_pages_argsArguments for get_pages() in the sitemap
toret_markdown_sitemap_posts_argsArguments for get_posts() in the sitemap
toret_markdown_sitemap_products_argsArguments for wc_get_products() in the sitemap
toret_markdown_sitemap_productsArray of products after loading (object filtering)
toret_markdown_sitemap_outputThe entire final sitemap text

πŸ€– What the plugin does

Toret Markdown adds an alternative output in Markdown format to every WordPress page. This is primarily designed for AI language models and agents (ChatGPT, Claude, Perplexity…), which process Markdown text more accurately and efficiently than standard HTML.

The plugin generates three types of outputs:

  • Page / post / product β€” ?format=markdown
  • Sitemap for AI β€” ?format=markdown-sitemap
  • llms.txt β€” instructions and context at the website root

πŸš€ Quick start

1. Enable the plugin

In the WordPress admin, go to Toret plugins > AI Markdown (Settings tab) and make sure the Plugin is active toggle is checked.

2. Test the output

Add ?format=markdown to the end of any URL on your website, e.g.:
https://yourdomain.com/page/?format=markdown

3. Setup llms.txt

In the plugin admin, go to the llms.txt tab and adjust the instructions for AI agents. The file will then be publicly available at https://yourdomain.com/llms.txt.

4. Check the sitemap

The AI sitemap is available at https://yourdomain.com/?format=markdown-sitemap. It contains an overview of all public pages with their descriptions.

βš™οΈ Settings description

Link tag in header

When enabled, the plugin inserts the <link rel="alternate" type="text/markdown"> tag into the <head> of every page. Modern AI agents recognize this tag and automatically fetch the Markdown version instead of HTML β€” without any action required on your part.

HTTP timeout

The plugin generates Markdown by internally fetching HTML pages and converting them. The timeout determines how long to wait for a response. For slow hosting or exceptionally large pages, we recommend increasing the value to 30–60 s.

Slug blacklist

One slug per line. Pages on the blacklist will not return Markdown output and will not appear in the sitemap. This typically includes technical pages without relevant content: cart, checkout, my-account, thank-you.

Access logging

The plugin logs every access to ?format=markdown and ?format=markdown-sitemap, including bot identification (GPTBot, ClaudeBot, Googlebot…). IP addresses are automatically anonymized. You can view the records in the Access log tab.

Max. log records

When the limit is exceeded, the oldest records are automatically deleted. The recommended value is 500–1000 for regular websites.

✏️ Markdown Editor

How the cache works

The plugin generates Markdown automatically on every access. The result is saved in the cache (as post meta), so repeated requests are very fast and do not overload the server. The cache is automatically refreshed whenever a post is edited.

Manual editing

In the Editor tab, you can manually edit and save the Markdown of any page. Once saved, the cache is locked β€” the plugin will no longer automatically overwrite it. The lock can be removed at any time using the button directly in the editor.

Locked cache

Pages with a locked cache are marked with a πŸ”’ icon in the editor. This means the website will show AI agents exactly the content you manually set β€” the plugin will not overwrite it even upon subsequent post updates.

πŸ—ΊοΈ AI Sitemap

What the sitemap contains

The sitemap is an overview of all public pages, posts, and products in Markdown format. For each item, it contains the title, URL, modification date, and a short description. AI agents use it to discover what exists on the website and then load details of specific pages.

Manual override

In the Sitemap tab, you can override the dynamically generated content with your own text and save it. Use the Restore dynamic sitemap button to easily revert to automatic generation.

Limits and ordering

In the Settings tab, you can set the maximum number of items and their sorting method for each content type (pages, posts, products). A value of -1 or 0 means unlimited.

πŸ€– llms.txt

What it is for

llms.txt is a standardized file (functioning similarly to robots.txt) that tells AI agents how to properly interact with your website. We recommend writing here: who you are, what you offer, which pages are most important for AI, and any restrictions that apply.

Default content

Using the Restore default content button in the llms.txt tab, the plugin automatically generates basic content from your WordPress data (site name, description, contact, sitemap URL). You then just need to add to it or manually adjust it.

πŸ“„ Metabox on pages and products

Where it appears

The Toret Markdown metabox appears in the right column of the editor for every post, page, and product. It shows the current cache status (up to date / not generated / excluded) and the exact date of the last generation.

Generate / Regenerate

Use the Generate (or Regenerate) button to trigger immediate Markdown cache generation for the given page directly from the admin, without needing to visit it. Once generated, a Preview link will appear to check the output.

Exclude from Markdown

The page will not return any Markdown output β€” instead, AI will be served regular HTML. At the same time, the <link> tag will not be inserted into the header, and the page will not be included in the sitemap.

Lock cache

Prevents automatic cache regeneration when the page is saved. This is useful if you have manually edited the Markdown in the editor and don’t want the plugin to accidentally overwrite it. The lock can be unchecked or unlocked directly in the editor at any time.

Hide from sitemap

Although the page will not appear in ?format=markdown-sitemap, its Markdown version remains fully accessible via ?format=markdown. The difference from Exclude from Markdown is that the content still exists, it just doesn’t appear in the global overview for AI.

Link to editor

The Edit Markdown in editor link at the bottom of the metabox will take you directly to the central Editor tab with the current page preselected.

πŸ—‚οΈ Category settings

Where it appears

The Toret Markdown panel appears on the post category edit page (Posts β†’ Categories). If WooCommerce is active, it works exactly the same for product categories (Products β†’ Categories).

Exclude from Markdown

The category page will not return Markdown output and the <link> tag will not be inserted into its header. It works identically to the option of the same name in the individual post metabox.

Hide from sitemap

The category will not appear in ?format=markdown-sitemap, but its Markdown version remains accessible via ?format=markdown. There is also a direct Markdown Preview link on the category edit page for quickly viewing the output.

πŸ›’ WooCommerce

Products in the sitemap

If WooCommerce is active, the sitemap automatically includes products, along with their price, stock status, and category. In the settings, you can filter products by category or limit their total number.

Product frontmatter

The Markdown output of a product page features a smartly expanded YAML header with price (price), currency (currency), stock status (in_stock), and SKU. This allows AI agents to know the exact and current state of the product without needing to complexly parse the HTML.

Product categories

On the product category page in the admin (Products β†’ Categories), you will find a metabox for optionally excluding the entire category from Markdown output and the sitemap.

πŸ—οΈ Plugin Architecture

md.php

The core of the entire plugin. It contains all constants, helper functions, the HTML β†’ Markdown converter (toret_html_to_markdown()), and output generators for individual page types.

markdown-handler.php

Handles HTTP requests. It intercepts the ?format=markdown and ?format=markdown-sitemap parameters, determines the page type, and calls the appropriate generator. It also includes bot detection and access logging.

metabox.php

Handles the metabox in the post, page, and product editors. Registers the metabox for all public post types, manages AJAX cache generation, and handles term meta for categories.

admin.php

Secures the administration interface. Registers the Settings API, renders all tabs, and handles saving settings.

πŸ“Œ Constants

ConstantValueDescription
TORET_MARKDOWN_CACHE_KEY_toret_markdown_cachePost meta key for saved Markdown
TORET_MARKDOWN_META_EXCLUDE_toret_markdown_excludePost meta – exclusion from Markdown output
TORET_MARKDOWN_META_LOCK_toret_markdown_lockPost meta – automatic cache regeneration lock
TORET_MARKDOWN_META_SITEMAP_EXCLUDE_toret_markdown_sitemap_excludePost meta – hide from sitemap
TORET_MARKDOWN_TERM_EXCLUDE_toret_markdown_excludeTerm meta – exclude category from Markdown
TORET_MARKDOWN_TERM_SITEMAP_EXCLUDE_toret_markdown_sitemap_excludeTerm meta – hide category from sitemap
TORET_MARKDOWN_LOG_OPTIONtoret_markdown_access_logOptions key for access log
TORET_MARKDOWN_LOG_MAX500Default max. number of log records

πŸͺ WordPress Filters (Hooks)

toret_markdown_post_frontmatter

Allows modifying the YAML frontmatter of a post/page before the closing ---. Parameters: string $frontmatter, WP_Post $post.

add_filter( 'toret_markdown_post_frontmatter', function( $fm, $post ) {
    $fm .= 'author: ' . get_the_author_meta( 'display_name', $post->post_author ) . "\n";
    return $fm;
}, 10, 2 );

toret_markdown_post_output

The final Markdown of the post/page including frontmatter. Parameters: string $markdown, WP_Post $post.

add_filter( 'toret_markdown_post_output', function( $md, $post ) {
    return $md . "\n\n---\n*Generated automatically.*\n";
}, 10, 2 );

toret_markdown_term_output

The final Markdown of the taxonomy page (category, tag). Parameters: string $markdown, WP_Term $term.

toret_markdown_archive_output

The final Markdown of the archive page (homepage, shop, archive, search). Parameters: string $markdown, string $type, string $title.

toret_markdown_sitemap_pages_args

Arguments passed to the get_pages() function when generating the sitemap. Parameter: array $args.

toret_markdown_sitemap_posts_args

Arguments passed to the get_posts() function when generating the sitemap. Parameter: array $args.

toret_markdown_sitemap_products_args

Arguments passed to the wc_get_products() function when generating the sitemap. Parameter: array $args.

toret_markdown_sitemap_products

Array of WooCommerce products after loading, before rendering them into the sitemap. Allows filtering or custom sorting of objects. Parameter: array $products.

toret_markdown_sitemap_output

The entire final text of the dynamically generated sitemap. Parameter: string $output.

πŸ’Ύ Cache Logic

Storage

The generated Markdown is saved as post meta with the key _toret_markdown_cache along with a timestamp in _toret_markdown_generated_at. The cache is only used for posts/pages/products β€” for terms (categories) and archives, it is always generated dynamically.

Invalidation

The cache is automatically deleted upon saving or deleting a post (save_post, before_delete_post), unless a lock (_toret_markdown_lock) is set on it. Manual deletion of the entire cache for the whole site is available in the Settings tab.

Generation

The plugin generates Markdown by internally downloading (fetching) HTML pages via wp_remote_get() and parsing the result with its own DOM converter (toret_html_to_markdown()). A special parameter _tmcb={timestamp} is added to the URL to effectively bypass CDN caching (Cloudflare, etc.).

βš™οΈ HTML β†’ Markdown Converter

Implementation

The plugin does not use any external library for conversion. The conversion is handled by native PHP DOMDocument combined with a recursive walker toret_dom_node_to_markdown(). The output finally goes through the cleaning function toret_cleanup_markdown().

Discarded tags

The content of <script>, <style>, <form>, <noscript>, <canvas>, and <svg> tags is completely ignored and is not transferred to the final output at all.

Complex lists

The plugin automatically detects “complex” <ul> elements (e.g., blog listings, product grids). Items longer than 100 characters or those containing headings are then rendered as clear sections separated by --- instead of standard bullet points.

Cloudflare email

The plugin recognizes Cloudflare email protection (/cdn-cgi/l/email-protection#…) and automatically decodes the real email address directly into readable output.

πŸ”Œ AJAX Endpoints

ActionNonceDescription
toret_markdown_generatetoret_markdown_ajax_generateGenerates or regenerates the Markdown cache for a given post_id. Requires the edit_posts administrator capability.
toret_markdown_unlock_cachetoret_markdown_unlock_cacheUnlocks the locked post cache β€” deletes the _toret_markdown_lock meta. Requires the edit_post capability for the given post.

πŸ€– Bot Detection

Recognized agents

Detection is done by comparing the User-Agent browser header (case-insensitive). Recognized types and their colors in the log are:

  • gpt β€” GPTBot, ChatGPT-User, OAI-SearchBot
  • claude β€” ClaudeBot, Claude-User, Claude-SearchBot, Anthropic
  • google β€” Googlebot, Google-Extended, Gemini, GoogleOther
  • bing β€” Bingbot, BingPreview, MSNBot
  • other_ai β€” PerplexityBot, Cohere-AI, YouBot, Diffbot, ByteSpider, AmazonBot and others
  • unknown β€” unrecognized User-Agent

πŸ—„οΈ Database – Saved Values

Option / Meta keyTypeDescription
toret_markdown_optionsoptionAll plugin settings as a serialized array
toret_markdown_access_logoptionAccess log as a JSON array (max. records according to settings)
toret_markdown_sitemap_contentoptionManually overridden sitemap content (empty = dynamic generation)
toret_markdown_llms_contentoptionContent of the llms.txt file
_toret_markdown_cachepost metaGenerated Markdown content of the post
_toret_markdown_generated_atpost metaDate and time of the last cache generation
_toret_markdown_excludepost/term metaExclusion from Markdown output
_toret_markdown_lockpost metaAutomatic cache regeneration lock
_toret_markdown_sitemap_excludepost/term metaHide from sitemap
Scroll to Top