The robots.txt file is one of the most important components of every website, including the ones related to Shopify. It gives instructions to web crawlers of search engines (Google, Bing, Yahoo, etc.) on which parts of the site they should and shouldn’t crawl and index. However, Shopify robots.txt is slightly different compared to the ordinary website.
For Shopify store owners, the difference comes from the fact their website is hosted on Shopify servers. Therefore, Shopify is controlling the hosting environment and makes handling of robots.txt different compared to handling it on the website that is located on the hosting your own.
In this guide, we will cover everything you need to know about Shopify robots.txt, including how to change robot.txt Shopify and what to do if you get blocked by robots.txt Shopify.
Robots.txt is essentially a standard used by websites to communicate with web crawlers and other robots coming from search engines.
It is a part of the Robots Exclusion Protocol (REP) — a group of standards that regulate how robots can crawl the web, index website content, and how that content is presented to the users.
The most common uses of robots.txt on any website are:
As we previously mentioned, the most common use of robots.txt is giving directives to web crawlers. The file contains directives and commands that control crawlers’ access to specific areas of your website. Here are the most commonly used directives in robots.txt file:
The User-agent directive specifies which web crawler the following rules apply to. If you want your rules to apply to all web crawlers, you can use an asterisk (*) as a wildcard.
Example:
User-agent: * — This applies the rules to all web crawlers.
The Disallow directive tells the crawler not to access certain parts of your site. If you want to block access to a specific folder or page, you should use this directive.
Example:
Disallow: /private/ — This prevents crawlers from accessing anything in the "/private/" directory.
The Allow directive is used to override a Disallow directive, indicating that a crawler can access a specific file or folder within a disallowed directory. This is useful for allowing access to certain content in a directory that is otherwise blocked.
Example:
Disallow: /private/
Allow: /private/public-file.html — This configuration blocks all content in the "/private/" directory except for "public-file.html".
The Sitemap directive points search engines to your XML sitemap, a file that lists all the important pages on your site. This can help crawlers discover pages they might otherwise miss.
Example:
Sitemap: http://www.example.com/sitemap.xml — This tells crawlers where to find your sitemap.
The Crawl-delay directive is used to limit how quickly a crawler can request content from your site, preventing server overload. However, not all search engines adhere to this directive.
Example:
Crawl-delay: 10 — This asks crawlers to wait 10 seconds between hits to your server.
Comments can be added to a robots.txt file using the hash symbol (#). These are for human readers and are ignored by crawlers.
Example:
# This is a comment
There are several things you need to remember about robots.txt directives and how you use them:
As we previously mentioned, Shopify robots.txt file is different from many other robots.txt files used on other websites. The use of the file is the same — optimize search engines to crawl and index the pages on your website efficiently, but these default settings can be limiting for some Shopify merchants.
Shopify robots.txt is optimized in a way to prevent search engines from indexing pages that might be duplicate content, private, or irrelevant to search engine users, such as admin pages, checkout, and cart pages. Here is how the default robots.txt setup looks like on Shopify:
Before June 2021, Shopify didn’t allow to its users to modify or edit these default settings. The limitations and the inability of users to change robot.txt Shopify caused many issues.
That resulted in using third-party apps to override Shopify’s decision, which led to some unfortunate results, even accidentally blocking the entire website from search engines. Shopify received harsh criticism from some users for doing nothing to solve this issue.
The limitations of this system were clear. The most obvious limitation was the inability to edit and further customize the file itself. Also, the default setup might not be optimal for all stores, especially large ones with many products, as it could lead to over-indexing of similar pages, impacting SEO negatively.
Without editing powers, store owners couldn't block specific content from being indexed, such as certain product pages or collections they didn't want to appear in search results.
Finally, in June 2021, Shopify announced that they provided an update allowing the editing and customization of robots.txt file. As a Shopify website owner, you now have better control over your website, and you can suggest the crawlers which pages you want to index, and which ones to hide.
This update completely changed the game, because now there is no need to install various third-party software in order to bypass Shopify’s default robots.txt settings, now you can change robot.txt Shopify in a straightforward way:
If you decide to edit Shopify robots.txt file manually, you need to be careful. Mistakes in this file can lead to SEO problems, even to get blocked by robots.txt Shopify. The most common mistakes are:
1) Using the Disallow command without specifying a particular path you want to block. This results in blocking all crawlers.
2) Using the Disallow command for sensitive pages. It is better to use other methods, such as password protection.
3) Overusing Wildcards. That can accidentally block or allow crawlers access to pages on your website.
4) Not testing changes. Always make sure to use tools such as Google’s Robots Testing Tool to test the changes after you make them.
5) Forgetting the sitemap. Not including a sitemap in your robots.txt file will hurt your SEO. Sitemap helps search engines crawl your website way more efficiently.
6) Using Comments incorrectly. Comments are added with the “#” symbol. If you don’t use it correctly, or you misplace the comments, can lead to confusion when executing your directives. Make sure that your comments are clearly separated from your directives.
Shopify robots.txt file obviously has a massive impact on SEO, since it directs search engine bots on how to crawl and index a website.
This is especially important for Shopify stores, since they are geared towards potential buyers. It is critical to get more potential buyers at your shop, and SEO can take a key role in that quest.
The main benefits of optimizing robots.txt correctly are avoiding indexation of duplicate and non-public pages, making your key selling pages visible, and, to some point, enhancing the security and speed of your website.
Managing robots.txt of your Shopify score is key for your SEO efforts. It guides search engines towards your most important content, potentially increasing the traffic and sales of your shop.
However, optimizing it incorrectly can do more harm than good. Shopify default settings will be enough for most of the users, but if you need to customize it further, don’t forget to always test the changes you made.
It’s time to awake your sleeping blog.
Try Bloggle
Alternatives