robots.txt multiple user-agent lines
-
I’m looking at the
robots.txt
file for a client’s website and it’s written in a way I haven’t seen before:User-agent: googlebot User-agent: slurp User-agent: msnbot User-agent: teoma Disallow: /
Does the
Disallow: /
apply to only teoma or does it apply to all 4 robots?
-
Is that the whole thing?
the Disallow applies to everything, which is … an odd way about it.
Thanks! Yes, odd is the word LOL. I also posted this same question on Aardvark and Yahoo Answers and I got opinions split. 3 out of 5 people that answered (including you) say it applies to all. No that’s not the whole file. See below:
User-agent: googlebot User-agent: slurp User-agent: msnbot User-agent: teoma User-agent: W3C-checklink User-agent: WDG_SiteValidator Disallow: / Disallow: /js/ Disallow: /Web_References/ Disallow: /webresource.axd Disallow: /scriptresource.axd User-agent: Mediapartners-Google* Disallow: User-agent: * Disallow: /webresource.axd Disallow: /scriptresource.axd Disallow: /js/ Disallow: /Web_References/
It actually looks like they copied it from this article and then added redundancies. The pages on their site do show up in Google but they show up without snippets. The site has been online since the 90s.
Okay, got out my book o’ robots.txt
The declare of user-agents (google, slurp, etc) at the top will obey the disallow below it. In THEORY having
Disallow: /
blocks everything BUT I know some hosts are hose heads.User-agent: Mediapartners-Google* Disallow:
That tells you to always allow Mediapartners-Google.
The rest says JUST disallow those sections. It looks like they wanted to be doubly sure, but frankly, you don’t need it in BOTH places.
I agree, I’m guessing they made a mistake b/c I’m pretty sure they don’t want to blacklist Google or those other engines. Those other files they blocked are 404s. I’m prob. going to recommend that they change it to something like:
User-agent: Mediapartners-Google* Disallow: User-agent: * Disallow: /js/
or simply
User-agent: * Disallow: /js/
I’m not sure if there’s an advantage to explicitly allowing Mediapartners. It should crawl it anyway as long as its not disallowed. Thanks again!
I would do this:
User-agent: * Disallow: /js/ Disallow: /Web_References/ Disallow: /webresource.axd Disallow: /scriptresource.axd User-agent: Mediapartners-Google Allow: / User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Image Allow: / User-agent: Googlebot-Mobile Allow: / User-agent: Browsershots Allow: / User-agent: Dotbot Allow: /
I find I get better results that way. Also if you’re running WP, which I presume you are, I would add in this:
Disallow: /trackback/ Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /xmlrpc.php Disallow: /wp-
They don’t need all that ??
Cool—that’s interesting about the better results—thanks! I’d imagine too that
Disallow:
would give the same results asAllow: /
and I guess the point is to give explicit instructions for the robots you want.I’m pretty sure that
Disallow: /wp-
disallows all the wp- folders. Is there a specific reason to disallow /wp-admin/ etc. separately?Basically, there’s no reason a BOT need to come look at wp-admin! ?? Drops the pings on your site, which reduces traffic, which makes your site happier.
Oh yea of course. =) I meant I think
Disallow: /wp-
disallows /wp-admin/ and /wp-content/ and /wp-includes/ or anything else that starts with /wp-I guess it doesn’t hurt to list all of them but it’s redundant isn’t it? Does it make a difference you think?
Ah, the folders specifically tell it ‘and nothing IN these locations, either!’ It’s more for the subfiles than the actual folder names.
Hi Guys,
Do either of you see if I’m blocking Google Analytics from tracking my site with this robots.txt set up:
——
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /feed
Disallow: /*/feed
Disallow: /comments
Disallow: /author
Disallow: /tag
Disallow: /archives
Disallow: /2011/*
Disallow: /20*
Disallow: /iframes
Disallow: /category/*/*
Disallow: */trackback
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
User-agent: *
Allow: /images
Allow: /slides
Sitemap: https://www.meninkilts.com//sitemap_index.xml——-
Cheers Brent
Generally, after a month, it’s best to make a new topic ?? This one really was resolved (and I’m gonna flag it ina second).
Anyway. User-agent: Googlebot looks like you’re blocking all .php files, which may be causing your problems.
Huge THANKS! What’s your Paypal. Coffee on me ?? Reach me on Twitter: @meninkilts Cheers.
@cyberbrent Google Analytics relies on the tracking code script. It’s totally independent from robots.txt. It looks like you got your robots.txt straightened out. Remember you can always see which URL’s are indexed by Google by searching for
site:meninkilts.com
Hey Ryanve,
We’ll Ipstenu has been awesome straightening out the Robots.txt. But alas Google Analy is just not being passed anything. Here is what I’ve done to try to get it to work (and it worked for years before this site update to WP):
1. I’ve tried just placing GA code manually at bottom of page (no reading by GA)
2. Have installed Yoast’s WordPress SEO plugin and placed at top in header (still no reading by GA). Yoast’s plugin authorized via OAoth and is 100% connect to GA acct.
3. Have reverified my domain with Google (https://www.google.com/accounts/ManageDomains)
4. Have verified my in Google Webmaster the domain also with THREE versions:
A. DNS verified
B. HTML google doc placed on server and verified.
C. Verified also with Google Analy via Google Webmaster (so they are linked seeing each other as owning same domain).
5. Have contacted hosting company for our VPS and there team has looked through and can’t see what could be causing issue of GA not picking up our hits on the site.* WordPress Stats is working fine as is AWStats on server.
So why oh why is Google Analy not picking up the counts?
Paypal for sure if you can sleuth this one out – It has me totally stumped and has been 5 days now of no stats inside GA (just 1 hit per day – from Googlebot is being registered).
Tweet me @meninkilts – Cheers Brent
At this point we should probably split off into a new topic, cause it’s nto robots anymore.
When I load your page, I can see the GA code in there so it’s THERE and that’s all Google should need… Is UA-2109736-2 your right GA ‘code’?
Are you using any GA filters on their site?
The only odd thing I see in your source code is, at the bottom, there’s this:
<script type="text/javascript"> <!-- include google analytics -->
and then a huge section of code I don’t recognize (nor see on my site, and I too am using Yoast). Maybe you have a function or something else that’s calling in the code twice?
You could start going down the list here: https://www.google.com/support/analyticshelp/bin/answer.py?answer=1009683
- The topic ‘robots.txt multiple user-agent lines’ is closed to new replies.