SEO Class

A Robot Friendly Robots.txt File

Posted April 1st, 2007 by SEO Class Admin

Robots.txt files are files that exist in the root directory of a website. Robots, crawlers, bots, spiders all scour the internet hungry for fresh content. When they find a website the immediately look for the robots.txt file that instructs them what to do or not to do. Often times when a webmaster looks at the log files of their server they will see 404 page not found errors in the log that indicates a page that was searched for that was not found. At the top of the list you will see a file called robots.txt that was not found. In order to fix this and other issues you should always check your web server log files and use the information to improve the website for your users and the search engines.

A robots.txt file is a simple text file that can be edited or created in notepad or any text editor. If you use a blog software, an e-commerce package, or some other application you may modify this file to tell robots to either crawl or not crawl certain parts of the site. Certain directories link the cgi-bin or possibly a folder with .js (javascript) files which may contain certain code you do not want the engines to see should be included in the exclusion list in the robots.txt file. For example of you used a javascript file to obfuscate an email address (to hide it from harvestors and bots) then you would exclude all bots from the folder containing the secure javascript files.

Robots.txt example:

User-agent: *
Disallow: /images/
Disallow: /top-secret-page1.html
Disallow: /private-directory1/
Disallow: /cgi-bin/
Disallow:/secret-javascript-folder/
Disallow: /duplacate-content-folder/

User-agent: EmailHarvestingbot #this tells the EmailHarvestingbot to not come to your site
Disallow:

User-Agent: Googlebot #tells Google’s Bot to wait one minute before it visits again if you are that cool :)
crawl-Delay: 60

Another basic robots.txt example:

User-agent: *
Disallow:

(this example tells all robots to come and disallows none. You should at least have this if nothing else)

More information on Robots.txt files