Skip to main content

Will New LLM Standards Shape the Future of Documentation?

· 6 min read

Large language models (LLMs) have forever changed how users interact with API documentation. Blindly following LLM-generated instructions rather than reading the documentation is becoming the norm...

Increasingly, documentation maintainers must cater to this shift toward LLM-driven development, while still providing material useful to human readers. The problem is that LLMs ingest documentation during training, so by the time they are deployed, LLMs often spit out responses based on documentation that is long out-of-date. Some LLMs attempt to solve this issue by scraping the latest version of a documentation site for current information, but those sites are built for humans, not AI. Sites often block bot traffic, or are too large to be parsed in their entirety, which can cause LLMs to hallucinate. This creates a gap between how humans and LLMs understand the context behind documentation—and there still isn’t a standard way of bridging that gap. Or is there?

llms.txt

A proposed solution for creating docs for consumption by LLMs and humans is to include an llms.txt file in the root of a website. For example, https://docs.anthropic.com/llms.txt. According to llmstxt.org, llms.txt is a Markdown file that “offers brief background information, guidance, and links to detailed Markdown files”. The Markdown files described are duplicates of each page on the same URL with the .md extension added (For example, docs.freestyle.sh/git becomes docs.freestyle.sh/git.md).

Why go through the trouble of serving two versions of the same page?

LLMs can get confused by the mess of HTML, CSS, and JavaScript that make up the human-readable part of documentation sites. Markdown is a lot easier for an LLM to parse. So why call it llms.txt if under the hood the file is actually a Markdown file (.md)?

llms.txt was inspired by robots.txt, a file originally included in websites to advise web crawlers (and recently generative AI bots) which pages they should or should not access. In other words, a robots.txt file directs bot traffic like a traffic cop with no real authority and llms.txt functions within the traffic rules as a city map for LLMs to read at inference time: when a user requests assistance, and an LLM scans a website for the latest documentation.

In short, the benefits for documentation owners adopting the llms.txt standard are:

  1. LLMs will have a better context for providing solutions directly from documentation, possibly cutting down on hallucinations.
  2. LLMs can easily display up-to-date documentation by directly handing a user Markdown versions of pages.
  3. Documentation can be continually optimized for human readers without the need for unnatural wording or formatting.

Implementing the llms.txt standard is simple for documentation websites built on static site generators like Docusaurus or MkDocs, where docs are written in Markdown. Just add routes for the Markdown (.md) versions of pages in the site’s routing logic. For sites built on MadCap Flare using XML files, the good news is that converting XML to Markdown is straightforward. Tools like Pandoc can automate conversion between file formats.

llms.txt is a standard for providing context to all LLMs that scan the web. There’s also a new standard for providing context for documentation in AI-powered code editors like Cursor.

MCP - Model Context Protocol

Model Context Protocol (MCP) has been called the “USB-C port for AI applications” by Anthropic, the protocol’s corporate backer and the company behind Claude. Introduced in November 2024, MCP extends the capabilities of an LLM by allowing a standard connection for an LLM to access additional tools, data, and templates. Simply put, Model Context Protocol provides an API for LLMs to do much more than just search the web and generate text output.

How can MCP make a documentation site better? MCP servers can give greater context to client LLMs through Resources—any data that gives an LLM more background information to help process the current query. Resources could be anything from database schema files to server logs, and in the case of documentation, Resources could be docs in Markdown format (much like the llms.txt standard).

MCP also introduces tools, or AI actions, which can be used to test API calls and code snippets. These actions allow developers to quickly validate new integrations, and provide helpful suggestions when a user encounters errors.

While it sounds futuristic, using MCP in the real world requires a significant investment in development and testing. Take Freestyle docs as an example. It took the company two iterations to successfully launch two simple MCP tools. listDocs, which lists all available documentation, and getDocById, which fetches a docs file by its ID. Additionally, with multiple code editors supporting MCP differently, documentation teams may need to create separate MCP implementations for each code editor (Cursor, VSCode, Void, etc.). After all that, the user still has to install your MCP integration. If you don’t advertise your MCP integration well, it may never be used.

To Conclude

New LLM standards relevant to documentation teams essentially do the same thing: they provide additional context to LLMs through extra Markdown content. In a way, these standards encourage stewards of documentation to spoon-feed LLMs raw data, without the guarantee that their users will enjoy better interactions with their documentation. LLM standards are marketed as an extension of SEO, search engine optimization, a way to increase "discoverability in AI-powered search experiences", but LLM standards are really a reaction to the fact that web scraping is hard.

Supporting these standards, especially Model Context Protocol, isn't cheap. Providing free labor to LLM vendors is unlikely to change that, because the standards discussed here are fairly new (less than a year old as of this writing).

LLM standards are primarily adopted by organizations with deep pockets and high LLM usage. For more modest documentation efforts, the return on investment is likely small, or nonexistent. However, as these standards mature and more tooling is built around them, smaller teams may soon find it practical to adopt llms.txt or the Model Context Protocol.

Despite recent leaps in LLM performance, human expertise is still needed to ensure accuracy and clarity in documentation. LLM standards may eventually redefine the role of documentation maintainers, shifting their focus towards curating context and guiding LLM outputs. But for now, documentation experts can still do what they do best: write for humans.