Blog Basics – Robots.txt

So you have your blog set up and you’ve started pouring your heart out. It’s natural that you expect your name to be all over search engines, loads of traffic, money, fame, your face on the cover of Forbes…hold it…not yet…nothing begins if a search engine doesn’t know you exist! No traffic, no money, no fame…
The next natural question is….how does a search engine find my site?
Text links from external sites (which lead the SE crawlers/robots to index your site) and manual search engine submissions requesting your site to be crawled. Web Robots are programs that travel across the Web. They’re also known as Web Wanderers, Bots, Crawlers or Spiders. They invariably find their way to sites on the back of embedded text links and check back for updates from time to time.
(The names are misleading and make them sound like independant adventurers moving from site to site and transmitting data home. This is not the case, these bots are couch potatoes, firmly planted home birds that constantly keep pinging new sites…disappointing, huh?)

You need to ensure that your site has a functional Robots.txt that allows all search bots to crawl your site. A Robots.txt is a simple text file that you need to upload into your root directory for e.g. http://maediratta.com/robots.txt. Feel free to copy the lines below and upload after correcting the URL.

# http://www.yoursite.com: robots.txt
#
User-agent: *
Disallow:

What this particular robots.txt has done is allowed all Bots to index the site and hasn’t banned any. If however, you do not want xyz bot on your site simple add them to the disallow list.

If you are keen on reading up, you should check http://www.robotstxt.org/wc/faq.html. Advanced users should check RoboGen. RoboGen is a Windows-based program that helps you easily manage robots.txt files for your websites. Instead of the time consuming manual management process, described in the tutorial, you can either edit your robot exclusion files on your local computer or on a remote FTP site.

Just remember that the Index of a Search Engine is built on the data gathered by these bots, so the more they crawl, the better your chances of featuring on a search result, getting traffic, money, fame, the cover of forbes and so on… 🙂

TwitterGoogle+LinkedInFacebookPinteresttumblrEmailStumbleUponDigg

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>