What is llm.txt?

What is llm.txt?

  • llm.txt is a proposed, not yet standardized, text file placed at the root of a website (e.g., https://example.com/llm.txt).

  • It serves as a declaration of permissions and restrictions specifically for LLM training, data ingestion, and retrieval-augmented generation (RAG) systems.

  • Its goal is to communicate clearly to AI data collectors, content scrapers, and indexing services what content is allowed or disallowed for use in training or response generation.

Why is llm.txt needed?

  • Unlike traditional bots, AI models train on large datasets scraped or licensed from the web.

  • Currently, there is no universal, machine-readable way to opt in or out of being used by LLMs.

  • llm.txt could provide a transparent way for content owners to set terms specifically about AI usage without impacting human users or traditional search indexing.

What Should be Included in llm.txt?

The format would be simple and text-based, similar to robots.txt, but tailored for AI/data usage. Here are common sections and directives it might include:

1. User-agent / Model

  • Specify which LLMs, data collectors, or AI agents the rules apply to.

  • Use * for all AI/data agents.

Example:

makefile

Copy

User-agent: *

2. Allow / Disallow

  • Specify which paths or content areas AI agents can or cannot use for training, indexing, or retrieval.

Example:

vbnet

Copy

Allow: /public-content/ Disallow: /private-data/ Disallow: /paid-content/

3. Purpose / Usage

  • Indicate how the data may be used, e.g., for training, live retrieval, or display in answers.

Example:

makefile

Copy

Usage: training, retrieval

4. Data Licensing

  • State what license applies to the data to clarify usage rights (e.g., Creative Commons, proprietary).

Example:

vbnet

Copy

License: CC-BY-4.0

5. Attribution / Citation Requirements

  • Specify whether attribution or citation is required when the content is used or referenced by an AI.

Example:

makefile

Copy

Attribution: required

6. Update Frequency

  • Indicate how often the llm.txt file is updated.

Example:

yaml

Copy

Last-Updated: 2025-07-09

7. Contact

  • Provide contact info for inquiries about data usage policies.

Example:

graphql

Copy

Contact: legal@example.com

Sample llm.txt File

vbnet

Copy

User-agent: * Allow: /blog/ Allow: /public/ Disallow: /private/ Disallow: /subscriptions/ Usage: training, retrieval License: CC-BY-4.0 Attribution: required Last-Updated: 2025-07-09 Contact: legal@example.com

Additional Notes

  • This is a proposed emerging standard, not yet widely adopted.

  • Adoption and enforcement would require cooperation from AI platform providers, data collectors, and webmasters.

  • It complements, but does not replace, existing files like robots.txt or sitemap files.

  • Could eventually be extended to support machine-readable formats (JSON, YAML) or linked via HTTP headers.