A robots.txt is a simple text file that contains the rules for indexing your website. A robots.txt file is the tool to directly communicate with the search engines. One of the most common uses of robots.txt file is to hide certain parts of your website that may be incomplete or in development stage from search engine crawlers. It basically tells Search engines which part of your site they can crawl and which part not to crawl. WordPress in itself contains a virtual robots.txt. So even if you don’t have robots.txt file, search engine will still index your site. However having your own manual robots.txt file gives you better control of things. In this article we will discuss about how to create and optimize WordPress robots.txt for SEO.
Create and Optimize WordPress Robots.txt for SEO
Creating robots.txt file
- Just create a txt file with name robots.txt.
- Upload it via FTP into your root folder. A good rule is to keep the robots.txt file in same place as your index file.
Setting rules inside robots.txt
The robots.txt has its own syntax to define rules called “directives”.
Basic robots.txt syntax
Few terms you should be familiar with while writing rules.
- User-Agent – Defines the search engine crawler like Googlebot, Bingbot etc.
- Disallow – Instructs the crawler not to crawl over defined files, pages, or directories
- An asterisk (*) – define universal directives. Selecting all of them
The robots.txt file generally starts with the name of a user agent. Then it is followed by Allow or Disallow instructions in next line. If you want to block all the search bots from your entire website, you would configure robots.txt in the following way:
User-agent: * Disallow: /
Similarly, the following code will allow only the Google the full access of your site where as all other sites are not permitted to crawl in your website.
User-agent: Googlebot Disallow: User-agent: * Disallow: /
Here are a few more syntax:
- Allow – Allows the crawling of bots on your server
- Sitemap – Tell crawlers where your sitemap resides
Allow is used when want to give search engines access to certain parts of your website
User-agent: * Allow: /wp-includes/my-file.php Disallow: /my-includes/
The above code will allow search bots to access only my–file.php file inside my–includes directory. You can also add sitemap related lines into your robot.txt file. The sitemap directive lines will tell search engines where to find the sitemap of your website.
Sitemap: http://www.mustbeweb.com/sitemap_index.xml Sitemap: http:// www.mustbeweb.com /post-sitemap.xml Sitemap: http:// www.mustbeweb.com /page-sitemap.xml Sitemap: http:// www.mustbeweb.com /category-sitemap.xml Sitemap: http:// www.mustbeweb.com /post_tag-sitemap.xml
Note: The usefulness of linking your XML sitemap from your robots.txt is controversial. So a better way will be adding them manually to your Google and Bing Webmaster Tools. We have a entire section of articles about Sitemaps that you can read to know more about Sitemaps.
Guidelines Your Robots.txt File for SEO
- It is recommended not to use robots.txt file to hide low quality content.
- Do not use robots.txt file to stop Google from indexing your category, date, and other archive pages
- It is not necessary to add your WordPress login page, admin directory, or registration page to robots.txt because WordPress have added no index tag as meta tag inside these pages.
- It is recommended that you disallow readme.html file in your robots.txt file. Disallowing readme file hides the version of your WordPress and protects you from those mass attacks.
- Disallow your WordPress plugin directory to strengthen your site’s security.
- Don’t use comments in Robots.txt file.
- Don’t keep the space at the beginning of any line and don’t make ordinary space in the file.
What Does an Ideal Robots.txt File Should Look Like?
Here is an example of robots.txt that we are using in our site.
sitemap: http://www.mustbeweb.com/sitemap_index.xml User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /archives/ Disallow: *?replytocom Disallow: /comments/feed/ User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: /
Setting up a robot.txt file is an important step for SEO. However, disallowing Google from accessing too much of your site can have adverse effect in search result ranking. The modern trend is to minimally set up robots.txt. Also make sure that your robots.txt file is configured correctly. If your robots.txt file is wrongly configured, it can be completely ignored by searching engines resulting in complete disappearance of your site from search engines. So your robots.txt file should be well optimized and should not block access to important parts of your blog.
Robot.txt file is a debatable topic. There is no agreed upon standard for best practices to set up your robots.txt in terms of SEO. So we encourage you to create your own robots.txt file according to your own requirement.
Latest posts by Kantiman Bajracharya (see all)
- What are Different WordPress Theme Licensing Terms? - December 21, 2017
- 4 Tips to Optimize Your WordPress for Social Media Share - November 30, 2017
- What is WordPress? Is WordPress Free? Why is WordPress so popular? - November 22, 2017