One of our clients recently had a problem in that their website’s pages were not indexed by Google. They should have had much better search engine rankings on a variety of keyword search terms, but were not listed in Google for those terms.
The Google “bot” was visiting the domain so what was the problem?
Turns out the hosting company had a global robots.txt file in their httpd server configuration with told Google and all other search engines to not index ANY website pages. Because there was no ‘robots.txt’ file in the root directory of the domain, the server’s ‘robots.txt’ file was being served up every time Google and any other major search engine visited.
To solve the problem, we created a ‘robots.txt’ file and placed it in the website’s root directory. The contents of the file are:
User-agent: *
Disallow:
This over rode the global configuration which was:
User-agent: *
Disallow: /
How can you tell if your hosting company has a global robots.txt file? Simple. See if you already have a robots.txt file in your website root directory. If you don’t, type your domain name and then /robots.txt and see what comes up like:
http://yourdomain.com/robots.txt
If you see some text (and not a “missing .404″ page), then it is likely your hosting company has configured a global robots. txt file. If it says “Disallow: /” then your site is not being indexed by major search engines that respect what is written in the robots.txt file.
You will need to create your own robots.txt file and upload it to your root directory to solve the problem, using the example of what we did, above.
Why would a hosting company have a default global robots.txt file? It’s a sneaky way of saving on bandwidth. Search engine spiders can use up a lot of bandwidth, especially on large websites where it downloads all the files and images from a website on a regular basis. Telling search engines not to index websites on a server will prevent bandwidth consumption.
But it doesn’t do you and your business much good.
