• Resolved ryanve

    (@ryanve)


    I’m looking at the robots.txt file for a client’s website and it’s written in a way I haven’t seen before:

    User-agent: googlebot
    User-agent: slurp
    User-agent: msnbot
    User-agent: teoma
    Disallow: /

    Does the Disallow: / apply to only teoma or does it apply to all 4 robots?

Viewing 15 replies - 1 through 15 (of 15 total)
  • Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Is that the whole thing?

    the Disallow applies to everything, which is … an odd way about it.

    Thread Starter ryanve

    (@ryanve)

    Thanks! Yes, odd is the word LOL. I also posted this same question on Aardvark and Yahoo Answers and I got opinions split. 3 out of 5 people that answered (including you) say it applies to all. No that’s not the whole file. See below:

    User-agent: googlebot
    User-agent: slurp
    User-agent: msnbot
    User-agent: teoma
    User-agent: W3C-checklink
    User-agent: WDG_SiteValidator
    Disallow: /
    Disallow: /js/
    Disallow: /Web_References/
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    
    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    Disallow: /js/
    Disallow: /Web_References/

    It actually looks like they copied it from this article and then added redundancies. The pages on their site do show up in Google but they show up without snippets. The site has been online since the 90s.

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Okay, got out my book o’ robots.txt

    The declare of user-agents (google, slurp, etc) at the top will obey the disallow below it. In THEORY having Disallow: / blocks everything BUT I know some hosts are hose heads.

    User-agent: Mediapartners-Google*
    Disallow:

    That tells you to always allow Mediapartners-Google.

    The rest says JUST disallow those sections. It looks like they wanted to be doubly sure, but frankly, you don’t need it in BOTH places.

    Thread Starter ryanve

    (@ryanve)

    I agree, I’m guessing they made a mistake b/c I’m pretty sure they don’t want to blacklist Google or those other engines. Those other files they blocked are 404s. I’m prob. going to recommend that they change it to something like:

    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /js/

    or simply

    User-agent: *
    Disallow: /js/

    I’m not sure if there’s an advantage to explicitly allowing Mediapartners. It should crawl it anyway as long as its not disallowed. Thanks again!

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    I would do this:

    User-agent: *
    Disallow: /js/
    Disallow: /Web_References/
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    
    User-agent: Mediapartners-Google
    Allow: /
    
    User-agent: Adsbot-Google
    Allow: /
    
    User-agent: Googlebot-Image
    Allow: /
    
    User-agent: Googlebot-Mobile
    Allow: /
    
    User-agent: Browsershots
    Allow: /
    
    User-agent: Dotbot
    Allow: /

    I find I get better results that way. Also if you’re running WP, which I presume you are, I would add in this:

    Disallow: /trackback/
    Disallow: /wp-admin/
    Disallow: /wp-content/
    Disallow: /wp-includes/
    Disallow: /xmlrpc.php
    Disallow: /wp-

    They don’t need all that ??

    Thread Starter ryanve

    (@ryanve)

    Cool—that’s interesting about the better results—thanks! I’d imagine too that Disallow: would give the same results as Allow: / and I guess the point is to give explicit instructions for the robots you want.

    I’m pretty sure that Disallow: /wp- disallows all the wp- folders. Is there a specific reason to disallow /wp-admin/ etc. separately?

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Basically, there’s no reason a BOT need to come look at wp-admin! ?? Drops the pings on your site, which reduces traffic, which makes your site happier.

    Thread Starter ryanve

    (@ryanve)

    Oh yea of course. =) I meant I think Disallow: /wp- disallows /wp-admin/ and /wp-content/ and /wp-includes/ or anything else that starts with /wp-

    I guess it doesn’t hurt to list all of them but it’s redundant isn’t it? Does it make a difference you think?

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Ah, the folders specifically tell it ‘and nothing IN these locations, either!’ It’s more for the subfiles than the actual folder names.

    Hi Guys,

    Do either of you see if I’m blocking Google Analytics from tracking my site with this robots.txt set up:

    ——
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /feed
    Disallow: /*/feed
    Disallow: /comments
    Disallow: /author
    Disallow: /tag
    Disallow: /archives
    Disallow: /2011/*
    Disallow: /20*
    Disallow: /iframes
    Disallow: /category/*/*
    Disallow: */trackback
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.xlsx $
    Disallow: /*.doc$
    Disallow: /*.pdf$
    Disallow: /*.zip$
    User-agent: *
    Allow: /images
    Allow: /slides
    Sitemap: https://www.meninkilts.com//sitemap_index.xml

    ——-

    Cheers Brent

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    Generally, after a month, it’s best to make a new topic ?? This one really was resolved (and I’m gonna flag it ina second).

    Anyway. User-agent: Googlebot looks like you’re blocking all .php files, which may be causing your problems.

    Huge THANKS! What’s your Paypal. Coffee on me ?? Reach me on Twitter: @meninkilts Cheers.

    Thread Starter ryanve

    (@ryanve)

    @cyberbrent Google Analytics relies on the tracking code script. It’s totally independent from robots.txt. It looks like you got your robots.txt straightened out. Remember you can always see which URL’s are indexed by Google by searching for site:meninkilts.com

    Hey Ryanve,

    We’ll Ipstenu has been awesome straightening out the Robots.txt. But alas Google Analy is just not being passed anything. Here is what I’ve done to try to get it to work (and it worked for years before this site update to WP):

    1. I’ve tried just placing GA code manually at bottom of page (no reading by GA)
    2. Have installed Yoast’s WordPress SEO plugin and placed at top in header (still no reading by GA). Yoast’s plugin authorized via OAoth and is 100% connect to GA acct.
    3. Have reverified my domain with Google (https://www.google.com/accounts/ManageDomains)
    4. Have verified my in Google Webmaster the domain also with THREE versions:
    A. DNS verified
    B. HTML google doc placed on server and verified.
    C. Verified also with Google Analy via Google Webmaster (so they are linked seeing each other as owning same domain).
    5. Have contacted hosting company for our VPS and there team has looked through and can’t see what could be causing issue of GA not picking up our hits on the site.

    * WordPress Stats is working fine as is AWStats on server.

    So why oh why is Google Analy not picking up the counts?

    Paypal for sure if you can sleuth this one out – It has me totally stumped and has been 5 days now of no stats inside GA (just 1 hit per day – from Googlebot is being registered).

    Tweet me @meninkilts – Cheers Brent

    Moderator Ipstenu (Mika Epstein)

    (@ipstenu)

    ?????? Advisor and Activist

    At this point we should probably split off into a new topic, cause it’s nto robots anymore.

    When I load your page, I can see the GA code in there so it’s THERE and that’s all Google should need… Is UA-2109736-2 your right GA ‘code’?

    Are you using any GA filters on their site?

    The only odd thing I see in your source code is, at the bottom, there’s this:

    <script type="text/javascript">
    <!-- include google analytics -->

    and then a huge section of code I don’t recognize (nor see on my site, and I too am using Yoast). Maybe you have a function or something else that’s calling in the code twice?

    You could start going down the list here: https://www.google.com/support/analyticshelp/bin/answer.py?answer=1009683

Viewing 15 replies - 1 through 15 (of 15 total)
  • The topic ‘robots.txt multiple user-agent lines’ is closed to new replies.