WordPress Robots.txt file for SEO

Having your WordPress Robots.txt properly setup for SEO will improve your rankings in the search results and get you more traffic.

WordPress, straight out of the box is an excellent platform that is feature rich, easy to use and highly customizable. But guess what, it comes with no WordPress robots.txt file and it’s navigation structure is absolutely horrible in the eyes of search engines because many of the features that make user navigation so easy and intuitive end up having the exact opposite effect for search engine spiders/bots. The ability to easily navigate to posts/pages by multiple links of various names can produce a pile of the ever-dreaded duplicate content that search engines such as Google will penalize your site for. Unless you want your pages in the supplemental index, read on to learn about how you can properly use a WordPress robots.txt file as one more SEO “must-do” to keep your site SERP’ing strong.

By properly utilizing a WordPress robots.txt file, we can tell search engine bots where to look and where to not waste their time.

WordPress robots.txt

The process of making a WordPress robots.txt file is quite easy, you just have to instruct  the bot by providing it a WordPress robots.txt file telling it to not look in any places where they won’t find any valuable content such as the wp-admin, wp-content etc folders. The WordPress robots.txt file itself should go in your root directory and guess what, it should be named robots.txt (I know, too obvious)

Here is a sample WordPress robots.txt file that you can use for your wordpress powered site, because guess what… WordPress doesn’t come with a WordPress robots.txt file by default, yep thats right, unless you have added a WordPress robots.txt file there will not be one there at all. So, here you go (this one will work just fine but you can obviously customize it to suite your specific needs by adding more directories to disallow access to.

[php]
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /
[/php]

Having a WordPress robots.txt file is by no means the only step that should be taken to optimize your WordPress site, but it is one good step in the right direction. I personally believe a combination of the proper use of WordPress SEO Meta Tags in addition to a WordPress robots.txt file is the best overall approach, the WordPress robots.txt file should be used primarily as a mechanism to restrict what folders are indexed but can also be used to show bots what areas they should specifically look in as well, I will write more on that topic another time. I hope you find this article useful, please feel free to comment and share any best practices you have in this area or questions you may have about WordPress robots.txt files.

24 thoughts on “WordPress Robots.txt file for SEO”

  1. The structure is solid, but a couple of things I noticed. I am pretty sure that you cannot use an allow command in your robots.txt… only disallow, lack of which is allow.

    Also, the wildcards are not supported by all engines, so it become necessary to use the user agent to specify rules with wildcards for just those that do… Otherwise one can make quite a mess with MSN =-) as they in all of their wisdom do not support wildcards.

    Reply
  2. Correct, not all search engines suppost wildcards but the big G does :-) and
    I have no problem being properly indexed by Bing or Yahoo.. While this
    example is not “technically” perfect it does work for my uses and
    accomplishes what I need done.

    If the big 3 would get together and agree on a set of standards that were
    universal across the board I would jump on ship right away, until then were
    all stuck just doing what works instead of what is 100% correct.

    Reply
  3. Admin Daily,
    I have one website that's a wordpress site of only 2 web pages. I used all the disallows you have listed but Google's keyword list for the site had “disallow”, “user”, and “agent” as the most frequently used keywords for the site. How do I stop google from using words in the robots.txt file as a source of keywords.

    Thanks

    Reply
  4. That is very odd, let me know what your domain is or use the contact form
    here and let me know and I will take a peek at it and see why this could be
    happening.

    Reply
  5. Jesse this is great info. Thank you for sharing with the community. I’m putting up a site that will not have any use for google ads or media partners. How do I disallow them?

    Disallow: / or Disallow: * or Disallow: /*

    Also, can you elaborate on the links you have included in the file? Never seen them in a robots text file.

    Thanks again

    Reply
    • Thanks Phil, I appreciate the feedback..

      To disallow something you just change it to Disallow: / and that will take care of everything from your site root folder down.

      the links you see aren’t supposed to be there, it seems that a plugin I use is messing with the code.. thanks for the heads up, I’m going to sort that out right now :)

      Reply
  6. Thanks for this valuable information.by the way do i have to add sitemap for subdomain in robots.txt and webmaster tools

    Reply
    • If you are using a manually created robots.txt file (instead of the virtual robots.txt file many plugins make) you will want to ad a sitemap reference for all domains in use to the file because each subdomain is considered a separate domain by search engines.

      Reply
  7. Hi Jesse, thanks for the article, one question, what are the reasons you used the links in the robots.txt, , special for the feed, is that to avoid duplicate content?

    Reply
    • Sorry lol… those were not supposed to be in there, a plugin that I use to automatically add in affiliate links based on keywords on the site had added those… It was supposed to be excluded from that. I have corrected this page and you shouldn’t see those links in there any more :)

      Reply

Leave a Comment