What is robots.txt?

By DigitalPulseHosting.com
2004-09-22 00:31:15
A text file stored in the top level directory of a web site to deny access by robots to certain pages or sub-directories of the site.

Robots.txt is a text file stored in the top level directory of a web site that is intended to be read by search engine robots as they enter a site, and tells it "how to behave". Only robots which comply with the Robots Exclusion Standard will read and obey the commands in this file. Robots will read this file on each visit, so that pages or areas of sites can be made public or private at any time by changing the content of robots.txt before re-submitting to the search engines. If you do not have a "robots.txt" file in your document root directory, you will most likely see periodic error messages regarding the "robots.txt" file. This is not a cause for concern.

If you are interested in the function and format of the "robots.txt" file, and other methods of controlling or disabling search engines (including META tags), visit these useful sites:

Example robot.txt rules


The following allows all robots to visit all files because the wildcard "*" specifies all robots.
User-agent: *
Disallow:
This one keeps all robots out.
User-agent: *
Disallow: /
The next one bars all robots from the cgi-bin and images directories:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
This one bans Roverdog from all files on the server:
User-agent: Roverdog
Disallow: /
This one bans keeps googlebot from getting at the cheese.htm file:
User-agent: googlebot
Disallow: cheese.htm



Page: 1
return to top