What is robots.txt?
2004-09-22 00:31:15
A text file stored in the top level directory of a web site to deny access by robots to certain pages or sub-directories of the site.
Robots.txt is a text file stored in the top level directory of a web
site that is intended to be read by search engine robots as they enter
a site, and tells it "how to behave". Only robots which comply with the
Robots Exclusion
Standard will read and obey the commands in this file. Robots will read
this file on each visit, so that pages or areas of sites can be made
public or private at any time by changing the content of robots.txt
before re-submitting to the search engines. If you do not have a
"robots.txt" file in your document root directory, you will most likely
see periodic error messages regarding the "robots.txt" file. This is
not a cause for concern.
If you are interested in the function and format of the "robots.txt" file, and other methods of controlling or disabling search engines (including META tags), visit these useful sites:
The following allows all robots to visit all files because the wildcard "*" specifies all robots.
If you are interested in the function and format of the "robots.txt" file, and other methods of controlling or disabling search engines (including META tags), visit these useful sites:
Example robot.txt rules
The following allows all robots to visit all files because the wildcard "*" specifies all robots.
User-agent: *This one keeps all robots out.
Disallow:
User-agent: *The next one bars all robots from the cgi-bin and images directories:
Disallow: /
User-agent: *This one bans Roverdog from all files on the server:
Disallow: /cgi-bin/
Disallow: /images/
User-agent: RoverdogThis one bans keeps googlebot from getting at the cheese.htm file:
Disallow: /
User-agent: googlebot
Disallow: cheese.htm
Page: 1


