Stumble Digg Technorati Delicious

Wordpress Robots.txt file for SEO

Buzz This!
April 2nd, 2009 by Jesse

Wordpress, straight out of the box is an excellent platform that is feature rich, easy to use and highly customizable. But guess what, it’s navigation structure is absolutely horrible in the eyes of search engines because many of the features that make user navigation so easy and intuitive end up having the exact opposite effect for search engine spiders/bots. The ability to easily navigate to posts/pages by multiple links of various names can produce a pile of the ever-dreaded duplicate content that search engines such as Google will penalize your site for. Unless you want your pages in the supplemental index, read on to learn one more SEO “must-do” to keep your site SERP’ing strong.

By properly utilizing a robots.txt file, we can tell search engine bots where to look and where to not waste their time. The process is quite easy, you just simply have to instruct  the bot not to look in any places where they won’t find any valuable content such as the wp-admin, wp-content etc folders. The file itself should go in your root directory and guess what, it should be named robots.txt (I know, too obvious)

Here is a sample robots.txt file that you can use for your wordpress powered site, because guess what… Wordpress doesn’t come with one by default, yep thats right, unless you have added one there will not be one there at all. So, here you go (this one will work just fine but you can obviously customize it to suite your specific needs by adding more directories to disallow access to.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads


# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*


# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*


# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /


# digg mirror
User-agent: duggmirror
Disallow: /

This is by no means the only step that should be taken to optimize your Wordpress site, but it is one good step in the right direction. I personally believe a combination of the proper use of meta tags in addition to a robots.txt file is the best overall approach, the robots.txt file should be used primarily as a mechanism to restrict what folders are indexed but can also be used to show bots what areas they should specifically look in as well, I will write more on that topic another time. I hope you find this article useful, please feel free to comment and share any best practices you have in this area.


  • Hey Jessie,

    Nice post, I went into this today too, you can check out my take on using robots.txt for silos in Wordpress here

    thanks
  • Thanks for stopping by, I'll check out your article as well.
  • maybe i need to modify my robots again so it will become more SEO.thank you very much
  • Thanks for stopping by, glad you found this post useful.
  • The structure is solid, but a couple of things I noticed. I am pretty sure that you cannot use an allow command in your robots.txt... only disallow, lack of which is allow.

    Also, the wildcards are not supported by all engines, so it become necessary to use the user agent to specify rules with wildcards for just those that do... Otherwise one can make quite a mess with MSN =-) as they in all of their wisdom do not support wildcards.
  • Correct, not all search engines suppost wildcards but the big G does :-) and
    I have no problem being properly indexed by Bing or Yahoo.. While this
    example is not "technically" perfect it does work for my uses and
    accomplishes what I need done.

    If the big 3 would get together and agree on a set of standards that were
    universal across the board I would jump on ship right away, until then were
    all stuck just doing what works instead of what is 100% correct.
  • Excellent Tip! One question: If I have wordpress in a subdomain [i.e: http://subdomain.mydomain.com] How should I configure the robots.txt? Thanks
    .-= ews_blog´s last blog ..Mootools Examples I: Array =-.
  • Hi and thanks for stopping by. If you have wordpress in a subdomain you do it exactly the same, as long as the robots.txt file is in your sites root folder it makes no difference what address the bots use to get there, the end result is the same.
  • thanks jesse, I'll set up immediately
    .-= ews_blog´s last blog ..Mootools Examples I: Array =-.
  • ark
    This is very helpful. Thanks.
    .-= ark´s last blog ..Where to Watch UFC 102 Live Streaming Online Free =-.
  • Informative post. I use a plugin "Platnum SEO"
    in my site. I can set no-follow options on all modules. It works very well.

    <abbr>ragy B. Garagnon’s last blog post..Trafficseeker SEO Software</abbr>
  • Interesting tip. I'll have to look into doing this for my sites.
  • Joe Jones
    Admin Daily,
    I have one website that's a wordpress site of only 2 web pages. I used all the disallows you have listed but Google's keyword list for the site had "disallow", "user", and "agent" as the most frequently used keywords for the site. How do I stop google from using words in the robots.txt file as a source of keywords.

    Thanks
  • That is very odd, let me know what your domain is or use the contact form
    here and let me know and I will take a peek at it and see why this could be
    happening.
blog comments powered by Disqus

"test (#twit live at http://ustre.am/oA)"

7730