If you are using WordPress and looking for SEO methods for your website, you must have heard of the Robots.txt file. However, do you really understand what it is, how it works, and its benefits in the SEO process? If you haven’t found out the Robots.txt in WordPress yet, let’s take a moment to read today’s blog.
An overview of the Robots.txt in WordPress
What is Robots.txt in WordPress?
Robots, in this case, are web bots (also known as web crawlers) that search engines use to scan and crawl web pages, thereby storing and categorizing web pages. So, the results related to the keyword that anyone search will appear on the search engines. However, there is a lot of information that website owners do not want these bots to scan through and display search results on search engines. Therefore, they used the Robots.txt file to create a set of instructions for search engine bots. Thanks to this file, bots will be instructed on which pages or information to scan. This file can be as detailed as you want.
The benefits and drawbacks of Robots.txt
|Keep some parts of the page private, prevent search engines from indexing certain files on your site||Search engines can still index a page that the Robots.txt file has blocked (in case other sites link to it)|
|Prevent duplicate content on a website||Each crawler will parse it in its own way|
|Specify the location of the sitemap||Some search engines may not support commands in the Robots.txt file|
|Prevent internal search results pages from showing up on Search Engine Results Pages (SERP)|
|Prevent your server from overloading through the Crawp-delay command to set the time|
Common terms in the Robots.txt file
There are 5 common terms that you need to understand in a Robot.txt file. They include:
- User-agent: This is the name of web crawlers, such as Bingbot, Googlebot, etc. If you want to apply the command for all web crawlers, simply use an asterisk.
- Disallow: Use when you want the User-agent not to collect any URL data. Each Disallow line will correspond to a URL. An empty disallow means that the bots can visit anywhere they want on your site.
- Allow: Applies to Googlebot search engine only. Use this command when you want Googlebot to access pages or subdirectories on your website.
- Crawl-delay: Notifies web crawlers about how long to wait before loading and crawling page content. For instance, you use the command Crawl-delay: 10. It means that the search engines have to wait 10 seconds before crawling your site. However, Googlebot does not apply this command.
- Sitemap: Use this command to provide the locations of any XML sitemaps associated with this URL.
Where is the robots.txt file located on a website and how to check it?
By default, after you create a WordPress website, a robots.txt file will be created automatically and located just below the server root directory. Therefore, you can check if your website has this Robots.txt file by adding /robots.txt after your website address. For example, your website is wpexample.com, so you just need add /robots.txt to become wpexample.com/robots.txt
Let’s analyze the example above. After the user-agent section with an asterisk, it means that the rule applies to all types of bots. Next, the file tells bots not to access the wp-admin, search, author, and downloads directory files with the disallow command.
If you add /robots.txt after your website address but there is nothing, your site may not have the Robots.txt file. So, you need to create a new Robots.txt file for your site. We will support you to do that step-by-step in the next blog.
What you need to put in the Robots.txt
- Block only one bot from accessing your WordPress site: For example, you don’t want Bing search engine to crawl your site, you can block only Bing by adding the following code to the Robots.txt file:
- Block bots from accessing a specific file/ folder: For instance, if you desire to block bots to crawl the entire wp-admin folder and wp-login.php, simply use the code below in the Robots.txt file:
- Allow bots to access a specific file in a disallowed folder: Now, in case you want to block bots to access the entire wp-admin folder except the wp-admin/admin-ajax.php file, you can use the allow command to do that. So, let’s see the example below:
- Stop bots from crawling WordPress Search results: If you are facing soft 404 errors, you can solve this issue by using the following code:
The query parameter ?s= is used in WordPress to stop bots from crawling the search results.
FAQs about Robots.txt in WordPress
- What is the maximum size of the Robots.txt file? -> About 500 kilobytes.
- How to edit WordPress Robots.txt? -> You can edit it manually, or use the WordPress SEO Plugins to edit Robots.txt from the backend.
- What happens if Robots.txt is not well-formatted? -> If the commands in the Robots file are misconfigured, search engines cannot understand the content of the command. From there, it will ignore the wrong commands in Robots.txt, then go to your website and scan the data.
- What happens if Disallow on Noindex content in Robots.txt file? -> Google will never see the Noindex directive because it can’t crawl the page.
- Is it possible to block all data collection by changing the Robots.txt file? -> You can suspend the crawling activities of crawlers by returning an HTTP 503 result code for every URL, including the Robots.txt file. The Robots.txt file should not be changed to do that.
All in all, we have already introduced to you the fundamental information about the Robots.txt file in WordPress. Therefore, we hope that the content in the blog today will be helpful for you. Moreover, in the next blogs, we will support you with some main issues related to the Robots.txt, including:
- Add & Edit the Robots.txt file in WordPress
- Update the WordPress Robots.txt
- Optimize the WordPress Robots.txt
- Fix the WordPress Robots.txt
So, don’t miss them if you want to boost the site’s SEO score with the Robots.txt file. Last but not least, if you have a plan to change the site interface, let’s visit our site and get many responsive Free WordPress Themes here.