While doing a little research on robots text file tips I Googled “robots.txt disallow tips” and what was the first listing I got?
![]()
So then I was interested in who might be linking to the White House’s robots.txt file – I mean, that’s not very interesting web reading. Or is it?
Well the good old Beeb gave me some information that I liked to hear – that the new administration is open. And responsible for removing a whopping 2374 or so lines from the old file, and whittling the new robots.txt file to a trim 3 lines.
I wanted to see how many sites are linking to the WH’s famous, or infamous, file. So using “allinlink: whitehouse.gov/robots.txt” I learned Google had 9,390 indexed pages linking to the White House’s file.
Although older pages I read on the issue discussed the idiocy of laying out all your ‘secret’ folders for the world, and terrorist hackers, to read was news a few years ago, it certainly made me think about how careless we can be as website owners with divulging information – information you thought you didn’t want anyone [like your competition] to know about.
So you want to know what your rival is working on for their new product launch? Take a peek at their robots.txt file and if they hire web techs like the old White House did you might be privvy to some juicy, secret, information.
Said rivals, hurry – check your robots file now and make sure you aren’t giving away your secrets.
What we tend to forget is that just because you have a robots.txt file, it doesn’t mean the pages you list won’t be indexed.
Yes, that’s right.
That begs the question – what can you do to stop search engines indexing your top secret pages? Well the obvious thing to do is to use password protected directories, or simply don’t ever have a link to a page you want hidden. No linkie, no findie.
I won’t go into all the details here, but will let others, more learned in the old robots methodology explain it to you:
An aptly named info site
More on creating a robots.txt file
Managing robots access
Even more help
Now safely make your own robots.txt file to get over any virtual arachnophobia:
Google webmaster tools
Robots.txt generator
And remember – the spiders will be coming soon.