• Resolved Joe Westcott

    (@redredweb)


    Hi Jeff, I’m unable to get Blackhole for Bad Bots to work on my site, even though I’ve disabled page caching for any URLs that include a ?blackhole GET parameter in the URL.

    You can see the problem here:
    https://www.childcareaware.org/?blackhole=191d1f5616

    Whatever it’s worth, here’s the robots.txt file:

    https://www.childcareaware.org/robots.txt

    And here’s what I see in the page response headers for that URL, using the following curl command:

    curl -I https://www.childcareaware.org/?blackhole=191d1f5616

    HTTP/2 200 
    cache-control: no-cache, must-revalidate, max-age=0
    content-type: text/html; charset=UTF-8
    link: <https://www.childcareaware.org/wp-json/>; rel="https://api.w.org/"
    link: <https://www.childcareaware.org/wp-json/wp/v2/pages/6>; rel="alternate"; type="application/json"
    link: <https://www.childcareaware.org/>; rel=shortlink
    server: nginx
    strict-transport-security: max-age=300
    x-pantheon-styx-hostname: styx-fe1-b-6565f7757d-v8hwj
    x-styx-req-id: f5d07a34-8cb2-11eb-9b04-a2bb4a80cf12
    date: Wed, 24 Mar 2021 15:09:43 GMT
    x-served-by: cache-mdw17381-MDW, cache-ewr18151-EWR
    x-cache: MISS, MISS
    x-cache-hits: 0, 0
    x-timer: S1616598582.092331,VS0,VE1055
    vary: Accept-Encoding, Cookie, Cookie
    age: 0
    accept-ranges: bytes
    via: 1.1 varnish, 1.1 varnish
    • This topic was modified 3 years, 8 months ago by Joe Westcott. Reason: corrected typo, clarified the nature of the page cache exclusion

    The page I need help with: [log in to see the link]

Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author Jeff Starr

    (@specialk)

    Hi @redredweb, when that happens it is because of caching. Usually page cache is the only thing that causes problems, but it looks like you have some other sort of caching that is interfering. The best way to determine what’s happening is to do some basic troubleshooting.

    Plugin Author Jeff Starr

    (@specialk)

    Joe, here some resources that may help with testing Blackhole:

    You’re probably familiar with troubleshooting WordPress; I mention here for the sake of any others who may be reading with the same issue.

    Thread Starter Joe Westcott

    (@redredweb)

    Hi Jeff, thanks for your reply. I tried the “check if the plugin is working” steps and I saw the Blackhole URL, but when I visit the URL, I fail to see an error, and nothing appears in the Blackhole log.

    A few questions to help troubleshoot:

    1. Should the blackhole URL (parameter value) change with every visit, if things are working correctly?
    2. Do the new robots.txt entries do anything other than warn good bots away, or do they serve some other sort of function? E.g. during testing, is it necessary to have robots.txt set up properly? Asking because we have a slightly weird test setup, which makes it a little difficult to troubleshoot this.
    3. When you wrote “it looks like you have some other sort of caching that is interfering”, were you referring to anything specific from the headers that I shared? The response headers seem to show that we’re successfully bypassing the cache, but if you’re expecting something different or you require certain headers in order for Blackhole for Bad Bots to work, please let me know.
    4. Does this plugin need anything else excluded from the page cache, apart from visits to ?blackhole URLs?

    Thank you,
    Joe

    Plugin Author Jeff Starr

    (@specialk)

    Glad to help:

    1) No, the nonce is dynamic. It changes periodically but not every time. You can find more details about nonces in the WP docs.

    2) The robots rules are not relevant when it comes to testing.

    3) Nothing specific, just based on the reported issue and commonality of cache-related problems.

    4) Blackhole does not work if *any* sort of page caching is enabled anywhere on site. Simply excluding specific page(s) does not help.

    Let me know if I can provide any further infos.

    Thread Starter Joe Westcott

    (@redredweb)

    Thanks very much, Jeff. It won’t be feasible for me to disable page caching outside of the ?blackhole URLs, but I really appreciate the clarity that you provided, and your help.

    FWIW, in case anybody searches and finds this support thread, I was trying to set up the plug-in on Pantheon hosting.

    Again, thank you, Jeff.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Not working despite disabled page cache on ‘blackhole’ URLs?’ is closed to new replies.