What is llm.txt?
What is llm.txt?
llm.txtis a proposed, not yet standardized, text file placed at the root of a website (e.g.,https://example.com/llm.txt).It serves as a declaration of permissions and restrictions specifically for LLM training, data ingestion, and retrieval-augmented generation (RAG) systems.
Its goal is to communicate clearly to AI data collectors, content scrapers, and indexing services what content is allowed or disallowed for use in training or response generation.
Why is llm.txt needed?
Unlike traditional bots, AI models train on large datasets scraped or licensed from the web.
Currently, there is no universal, machine-readable way to opt in or out of being used by LLMs.
llm.txtcould provide a transparent way for content owners to set terms specifically about AI usage without impacting human users or traditional search indexing.
What Should be Included in llm.txt?
The format would be simple and text-based, similar to robots.txt, but tailored for AI/data usage. Here are common sections and directives it might include:
1. User-agent / Model
Specify which LLMs, data collectors, or AI agents the rules apply to.
Use
*for all AI/data agents.
Example:
makefileCopy
User-agent: *
2. Allow / Disallow
Specify which paths or content areas AI agents can or cannot use for training, indexing, or retrieval.
Example:
vbnetCopy
Allow: /public-content/ Disallow: /private-data/ Disallow: /paid-content/
3. Purpose / Usage
Indicate how the data may be used, e.g., for training, live retrieval, or display in answers.
Example:
makefileCopy
Usage: training, retrieval
4. Data Licensing
State what license applies to the data to clarify usage rights (e.g., Creative Commons, proprietary).
Example:
vbnetCopy
License: CC-BY-4.0
5. Attribution / Citation Requirements
Specify whether attribution or citation is required when the content is used or referenced by an AI.
Example:
makefileCopy
Attribution: required
6. Update Frequency
Indicate how often the
llm.txtfile is updated.
Example:
yamlCopy
Last-Updated: 2025-07-09
7. Contact
Provide contact info for inquiries about data usage policies.
Example:
graphqlCopy
Contact: legal@example.com
Sample llm.txt File
vbnetCopy
User-agent: * Allow: /blog/ Allow: /public/ Disallow: /private/ Disallow: /subscriptions/ Usage: training, retrieval License: CC-BY-4.0 Attribution: required Last-Updated: 2025-07-09 Contact: legal@example.com
Additional Notes
This is a proposed emerging standard, not yet widely adopted.
Adoption and enforcement would require cooperation from AI platform providers, data collectors, and webmasters.
It complements, but does not replace, existing files like
robots.txtor sitemap files.Could eventually be extended to support machine-readable formats (JSON, YAML) or linked via HTTP headers.