The service file robots.txt contains instructions for search robots. Using this file, you can prohibit or restrict the access of search robots to certain pages or the entire site. You can also specify restrictions for different types of search robots. Reducing the permissible frequency of search engine robots' requests to the site allows to reduce the load on the site and on the server. On the other hand, a complete restriction can lead to decrease the site's position in the search engine results. Therefore, it is very important to correctly configure robots.txt. This file is located in the root directory of the site: /public_html/robots.txt.
- Each rule in the robots.txt file is written on a new line as follows:
- The new block of rules in the robots.txt file starts with the User-agent directive. Blank lines cannot be left inside the block. An empty string is used to separate blocks.
- Notes are separated by #.
- The name must be indicated in lower case. ROBOTS.TXT or Robots.txt are incorrect names.
- For some robots, directives must be specified separately. For example, YandexDirect takes into account only the rules that are created specifically for it. GoogleBot ignores Host and Crawl-Delay.
It is recommended to check the robots.txt file in special services, for example, Yandex, Google.
Each block starts with a User-agent directive. The robot for which this rule is used is indicated here.
For example, to set a rule for the Yandex index bot, enter:
To apply the rule to all Yandex and Google bots, enter:
To apply the rule to all robots:
Disallow и Allow
Disallow and Allow directives allow or deny access to sections of the site.
For example, to deny access to the entire site, enter:
To at the same time allow access to the catalog of the site catalog1 enter:
If you want to disable the indexing of the /catalog1/* pages, but allow the /catalog1/catalog12 pages, enter:
User-agent: * #or bot_name
Each rule is written on a new line and can include only one folder. A separate rule must be specified for each new folder.
It is recommended to restrict access to the site of certain bots. This reduces the load on the site. For example, the majestic.com service uses the MJ12bot search bot, ahrefs.com uses the AhrefsBot. To deny access for several bots, enter:
User-agent: MJ12bot # rule works for bot MJ12bot
User-agent: AhrefsBot # rule works for bot AhrefsBot
User-agent: DotBot # rule works for bot DotBot
User-agent: SemrushBot # rule works for bot SemrushBot
Disallow: / # denying access to the entire site
- Disallow: - empty directive, i.e. prohibits nothing.
- Allow: / - the directive allows everything.
- $ - denotes a strict match for the parameter. For example, the directive Disallow: / catalog $ denies access to this catalog only. Access to catalog1 or catalog-best will be allowed.
If the sitemap.xml file is used to describe the site structure, you can specify the path to it:
This directive tells Yandex robots the location of the site mirror.
For example, if the site is also located on the https://domain2.com domain, enter:
The robot accepts only the first Host directive specified in the file, the rest are ignored.
If http is used, the mirror can be specified without the protocol - domain2.com. If https is used, the protocol must be specified - https://domain2.com.
The Host directive is specified after Disallow and Allow.
The Crawl-delay directive sets the minimum interval with which robots can access the site. This reduces the load on the site.
The value is specified in seconds (separator - point).
The Crawl-delay directive is specified after Disallow and Allow.
The Google bot hit rate is set in Search Console.
This directive is intended for Yandex and allows you to exclude pages with dynamic parameters in URLs from indexing. The robot will not re-index the content of the pages and create additional load.
For example, the site has pages:
In fact, these are two identical pages with different dynamic content. To Yandex not indexing every copy of this page, enter the directive:
Clean-param: parm1&parm2&parm3 /news.html
The & is used to list parameters that the robot does not take into account. Then the page for which this directive applies.
If you have any questions, please create a ticket to technical support.