Instructions

How to configure llms.txt and why one file is not enough

File llms.txt on the website for neural networks

llms.txt is a text file at the root of the site where you list which pages are important for language models. The idea is simple: help the bot not guess. In practice, one file makes little difference, but one error in the path means you sent the neural network to the wrong place.

Below is how llms.txt is structured, what to write there, typical mistakes when setting it up yourself, and why after a “quick fix” visibility sometimes drops rather than increases.

What is llms.txt and why is it used?

The format was proposed as an analogue of robots.txt, but for LLM crawlers. The file usually contains:

a brief description of the company or project;
list of URLs with explanations: services, prices, FAQ, contacts;
sometimes links to markdown versions of pages or sitemaps for models.

The file does not guarantee inclusion in recommendations. It reduces the chance that the bot will take a random page - a blog from 2019, an outdated promotion, a draft in the test folder.

Without llms.txt, the model can still find you. With the llms.txt curve, it can “learn” from garbage faster than without the file.

Who reads the file: GPTBot from OpenAI, ClaudeBot, PerplexityBot, and other LLM crawlers. Google relies on its own index, but clean site structure helps everyone in practice. The file doesn't replace Google indexing, but lowers the chance a Western model picks the wrong page when answering a user.

Don't confuse llms.txt with llms-full.txt and GitHub mirrors. Take the specification from the official format repository, but write the content for your domain. Templates from articles often have author and version fields, which you don’t need and only make noise.

Five signs of a crooked llms.txt

Paths from someone else's template lead to 404 on your domain.
50+ URLs - the model does not understand what is important.
They indicated the price in PDF - the bot sees the link, but does not see the numbers.
robots.txt cuts pages from the file - a conflict with itself.
The file was not updated after changing the service URL - dead links.

A crooked pointer is worse than no pointer: the bot walks around, finds garbage, and detects instability.

Minimal working example

Structure without pretensions to a standard (there are no uniform ones yet):

# Company title
> Briefly: what do you do, city, for whom.

## Services and prices
- https://example.com/services/ — pricing and timelines
- https://example.com/faq/ — customer answers

## About the company
- https://example.com/about/ — licenses, team

## Contact
- https://example.ru/contacts/

Each URL must open, return 200, and match what is on the site and in the maps. Redirects to another domain, http instead of https, a typo in the path is already an error.

Comments after the URL are optional, but they help: “price is valid for May”, “clinic license”, “FAQ for registration”. One line of context reduces the chance that the model will take the page for the wrong purpose.

Why is one file catastrophically not enough?

llms.txt does not replace:

normal commercial content on pages;
Schema.org and FAQ with real questions;
consistent listings on Google Business and Yelp;
open access to the necessary bots in robots.txt;
reviews and mentions outside.

The file is a pointer. If the pointer leads to an empty room, the model won't recommend you. Many owners spend an evening on llms.txt and never touch site prices. Copilot names whoever has matching prices and address everywhere.

About the overall picture of AIO: article about AIO vs SEO. On Copilot: Copilot recommendations.

Where is the file: only /llms.txt at the domain root—not in a subfolder, not renamed to txt.pdf. For subdomains blog.example.com and example.com—separate files if you want both visible. On site builders you may need a dedicated "root file" block, or the editor creates a page instead of a text resource.

Update llms.txt when changing the price list, promotions, or service URLs. An old file with dead links is worse than none: the bot walks around, finds 404, and detects instability. Once a quarter, reconciliation with the sitemap is the normal minimum for a living business.

On a landing page of one page, the file is almost meaningless - the bot only sees it anyway. On a multi-page site without llms.txt, the model more often takes a blog, a press release, or an outdated promotion. You yourself did not indicate what is important.

Markdown versions of pages help if they match the HTML and there are prices. A file with links to empty markdown without numbers is again a pointer to an empty room.

llms.txt structure for language models — The index only works if the pages behind the links are live and consistent with the maps.

Examination llms.txt

Is there a file and are bots reading it?

The audit will show technical access, robots.txt and critical holes near llms.txt.

Errors during self-configuration

Copy-paste from someone else's site

Downloaded llms.txt from the SaaS template. The paths /docs/, /api/, /pricing/ on your domain lead to 404. The bot detects garbage. Trust in the domain is not growing.

They indicated everything in a row

50 URLs including service ones, tags, city duplicates. The model does not understand what is important. Five strong pages are better than fifty noisy ones.

Forgot about https and slash

example.com/services and example.com/services/ may be the same to you but different to a crawler. Use the canonical URL from your sitemap.

Posted and not checked

The file is there, but nginx returns 404 to /.well-known/ or the cache of the old version. Verification: open in incognito, run the URL in the Leadsy audit.

Bots closed after publication

llms.txt says “read us”, robots.txt says “not allowed”. The AI doesn't argue, it just walks away. This is how they “scare” visibility in one evening.

Version for the English site

Two languages—two URL sets in one file or two files on different domains. Mixed locales without labels confuse geography and currency. For a US company with an en version, label sections explicitly.

Auto-generation from sitemap

The script uploaded 500 URLs to llms.txt. Formally beautiful, conceptually rubbish. Manual selection is better than “let the robot figure it out.”

Work order if you do it yourself

Audit before changes. Find out if bots can see you and what is already broken.
List of 5–10 canonical URLs: services, prices, FAQ, about the company, contacts.
Checking each URL manually and in the report.
Publishing /llms.txt in the root, without BOM, UTF-8.
Agreement with robots.txt: the required user-agents are not in Disallow.
Repeat audit in a week, not earlier - give the crawlers time.

If the express audit has a red zone for technical access or schema, this is first. llms.txt is the second stage, not the first.

When is the best time to order?

There is no developer, a site on a builder with crooked URLs, several subdomains, old WordPress with duplicates - homemade will cost more than fixing. One incorrect Disallow in robots.txt crosses out a neat llms.txt.

Leadsy does a full audit with technical specifications: what URLs are in llms.txt, what to edit in schema, what to open to bots. Turnkey implementation - if there is no one to contribute to the server.

Where to start today

Do not copy someone else's llms.txt. Do not publish the file until the audit. Don't close bots "just in case."

First cut into the shape below. Then 5-10 URLs with live content. Reconciliation with robots.txt. Repeat audit in a week - not earlier.

Example: a dental clinic listed five URLs but pricing was in PDF. The bot saw the link, not the price. Copilot named the clinic with prices in HTML. Moving pricing to a page shifted results without expanding the file.

Typical: an evening on llms.txt, site prices untouched. Copilot names whoever has numbers in HTML and listings that match. The file doesn't replace a price list.

On WordPress - FTP or static to the root. On Bitrix - public part. On Tilda - zero block “document”. There is only one check: the text is opened via a direct link in incognito, it was not “uploaded in the admin panel”.

UTF-8 without BOM. If a 404 HTML page is returned instead of text, the bots leave and trust drops.

Priority without a developer: (1) audit, (2) phone number and address on the site = maps, (3) prices and FAQ in HTML, (4) robots.txt, (5) schema, (6) llms.txt, (7) re-cutting. Jumping to the sixth without 2-4 is the most common failure.

If the file is already there and there is zero traffic from the AI, do not delete it in a panic. Find what breaks the picture more. Removing without diagnostics leaves holes and minus one pointer.

llms.txt enhances what has already been collected. An empty website file will not save you. Pricing: tariffs page. Questions: FAQ.

A free audit will show whether the file exists, whether bots are reading pages from it, and what to fix first. Without cutting, one evening with a crooked file can cost a month of visibility.

Who in the company maintains llms.txt: the developer puts the file, marketing gives a list of URLs, the owner approves the price list and services. One person “threw out links” without checking – a typical failure.

Don't delete a file in a panic if it "didn't work in a week." First, cards, schema, robots. Then look at the pointer again.

Separately about hosting with WAF: allowlist for LLM crawlers is more important than a new paragraph on the main page. Visible in the technical access audit.

Typical: published llms.txt, forgot pricing in HTML. Copilot names the clinic with prices on the page. The file didn't replace content—it only pointed where to look.

No more than 10–15 lines per file for a multi-page site. The rest is sitemap. Otherwise, you blur the main thing.

Link with schema and FAQ: The pointer points to pages with facts, not fluff.

Before edits

Visibility slice before setting llms.txt

For free. Don't risk getting your bot closed by random robots.txt.