• In my robots.txt I have:

    User-agent: *
    Disallow: /images/

    I then use teleport pro webcrawler to “gather” my site and it is still retrieving the images inside of /images/ folder.

    Is there a way to block ALL bots using .htaccess EXCEPT for the googlebot? I’m showing tons of spiders crawling my site in my logs. I think a lot of them don’t obey the robots.txt

Viewing 5 replies - 1 through 5 (of 5 total)
  • Thread Starter giantman

    (@giantman)

    WTF??? I just put:

    User-agent: *
    Disallow: /

    in robots.txt and teleport Pro is still finding the images and other files.

    Thread Starter giantman

    (@giantman)

    SetEnvIfNoCase user-agent “^Teleport\ Pro” bad_bot=1

    <FilesMatch “(.*)”>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </FilesMatch>

    Just tried that, and teleport pro is STILL finds the images and 60 other files. grr.

    Thread Starter giantman

    (@giantman)

    WTF????????????????

    I put:

    Options -Indexes

    into an htaccess file inside /images/ and teleport pro is STILL finding the images. WTF!!!!!!!!!!!!!???????

    Options -Indexes

    That just limits casual browsing of your directories in the absence of an index file.

    Google might have some re-write tips for you.

    “how to” ban “teleport pro” teleport bot

    This might spark some ideas or lead to more helpful information – An example WordPress robots.txt file:

    Thread Starter giantman

    (@giantman)

    dang…I just downloaded “blackwidow” webcrawler from download.com and its crawling my whole site finding almost everything…and I have about 6 security plugins blocking all kinds of stuff, plus a huge htaccess file that is supposed to be blocking BlackWidow and 100’s of other bots….and it’s still crawling my site….

    This is bad….

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Bots’ is closed to new replies.