What is llm.txt?
What is llm.txt
?
llm.txt
is a proposed, not yet standardized, text file placed at the root of a website (e.g.,https://example.com/llm.txt
).It serves as a declaration of permissions and restrictions specifically for LLM training, data ingestion, and retrieval-augmented generation (RAG) systems.
Its goal is to communicate clearly to AI data collectors, content scrapers, and indexing services what content is allowed or disallowed for use in training or response generation.
Why is llm.txt
needed?
Unlike traditional bots, AI models train on large datasets scraped or licensed from the web.
Currently, there is no universal, machine-readable way to opt in or out of being used by LLMs.
llm.txt
could provide a transparent way for content owners to set terms specifically about AI usage without impacting human users or traditional search indexing.
What Should be Included in llm.txt
?
The format would be simple and text-based, similar to robots.txt
, but tailored for AI/data usage. Here are common sections and directives it might include:
1. User-agent / Model
Specify which LLMs, data collectors, or AI agents the rules apply to.
Use
*
for all AI/data agents.
Example:
makefile
Copy
User-agent: *
2. Allow / Disallow
Specify which paths or content areas AI agents can or cannot use for training, indexing, or retrieval.
Example:
vbnet
Copy
Allow: /public-content/ Disallow: /private-data/ Disallow: /paid-content/
3. Purpose / Usage
Indicate how the data may be used, e.g., for training, live retrieval, or display in answers.
Example:
makefile
Copy
Usage: training, retrieval
4. Data Licensing
State what license applies to the data to clarify usage rights (e.g., Creative Commons, proprietary).
Example:
vbnet
Copy
License: CC-BY-4.0
5. Attribution / Citation Requirements
Specify whether attribution or citation is required when the content is used or referenced by an AI.
Example:
makefile
Copy
Attribution: required
6. Update Frequency
Indicate how often the
llm.txt
file is updated.
Example:
yaml
Copy
Last-Updated: 2025-07-09
7. Contact
Provide contact info for inquiries about data usage policies.
Example:
graphql
Copy
Contact: legal@example.com
Sample llm.txt
File
vbnet
Copy
User-agent: * Allow: /blog/ Allow: /public/ Disallow: /private/ Disallow: /subscriptions/ Usage: training, retrieval License: CC-BY-4.0 Attribution: required Last-Updated: 2025-07-09 Contact: legal@example.com
Additional Notes
This is a proposed emerging standard, not yet widely adopted.
Adoption and enforcement would require cooperation from AI platform providers, data collectors, and webmasters.
It complements, but does not replace, existing files like
robots.txt
or sitemap files.Could eventually be extended to support machine-readable formats (JSON, YAML) or linked via HTTP headers.