Viewing 7 replies - 1 through 7 (of 7 total)
  • Plugin Support Imran – WPMU DEV Support

    (@wpmudev-support9)

    Hello @jetpackrabbit

    I trust you’re doing well!

    The 403 error is a Permission error. The Bircekn Link Checker used your site’s IP to ping a 3rdf party site. If a 3rd party site has a security layer added that blocks external pings, it will return a 403 error.

    I’ve tested these links and these are marked with 200 status. If on your site there are multiple link storm that site, perhaps your IP address was blacklisted.

    Please contact the owners of that site and ask them to whitelist your site’s IP.

    Kind regards,
    Nastia

    I’m also seeing 403 error codes on “good” links and feel that Link Checker should have some automation to detect known problem conditions. From what I’ve seen, sites using Cloudflare are returning a 403 error with a web page that includes an embedded captcha that auto-redirects when a human browser is used to access the site but fails with error when automation attempts to go to the site. A prime of example of this is pixabay.com (I’ve seen it on multiple other sites as well, so it’s by no means unique to an individual site). When I go to the site with my browser, it works just fine. When I poll the site with automation, I get:

    
    quark ~ $ curl -v https://pixabay.com/
    *   Trying 104.18.21.183:443...
    * Connected to pixabay.com (104.18.21.183) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
      CApath: none
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
    * TLSv1.3 (IN), TLS handshake, Certificate (11):
    * TLSv1.3 (IN), TLS handshake, CERT verify (15):
    * TLSv1.3 (IN), TLS handshake, Finished (20):
    * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.3 (OUT), TLS handshake, Finished (20):
    * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
    * ALPN, server accepted to use h2
    * Server certificate:
    *  subject: C=US; ST=CA; L=San Francisco; O=Cloudflare, Inc.; CN=pixabay.com
    *  start date: Jun 12 00:00:00 2020 GMT
    *  expire date: Jun 12 12:00:00 2021 GMT
    *  subjectAltName: host "pixabay.com" matched cert's "pixabay.com"
    *  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
    *  SSL certificate verify ok.
    * Using HTTP2, server supports multi-use
    * Connection state changed (HTTP/2 confirmed)
    * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
    * Using Stream ID: 1 (easy handle 0x55fdba09b8b0)
    > GET / HTTP/2
    > Host: pixabay.com
    > user-agent: curl/7.70.0
    > accept: */*
    > 
    * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
    * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
    * old SSL session ID is stale, removing
    * Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
    < HTTP/2 403 
    < date: Tue, 23 Jun 2020 16:01:22 GMT
    < content-type: text/html; charset=UTF-8
    < cf-chl-bypass: 1
    < set-cookie: __cfduid=de3e3ee21c59642403dc7bbc79f8356291592928082; expires=Thu, 23-Jul-20 16:01:22 GMT; path=/; domain=.pixabay.com; HttpOnly; SameSite=Lax; Secure
    < cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    < expires: Thu, 01 Jan 1970 00:00:01 GMT
    < x-frame-options: SAMEORIGIN
    < cf-request-id: 038382b12a00000d12f03b7200000001
    < expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
    < server: cloudflare
    < cf-ray: 5a7f6d61ddb10d12-ATL
    < alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
    < 
    * Connection #0 to host pixabay.com left intact
    <!DOCTYPE html>
    <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
    <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
    <head>
    <title>Attention Required! | Cloudflare</title>
    <meta name="captcha-bypass" id="captcha-bypass" />
    <meta charset="UTF-8" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
    <meta name="robots" content="noindex, nofollow" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
    <!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
    
    [snip]
                
                <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
              </div>
    
              <div class="cf-column">
                <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>
                
    [snip]
    
    </body>
    </html>
    

    Notably, it’s a captcha to detect humans. I think going to all the 3rd party sites who use Cloudflare and asking for specific whitelisting of blog sites linking to their webservers is unrealistic. Clearly, the site is alive and well, which is what the plugin is intended to check. The fact that the plugin uses HTTP error code to base the decision on doesn’t mean that the plugin is behaving correctly and giving the right answer in this case. It’s admittedly behaving as designed, but the design needs to evolve to actually solve the business case the plugin attempts to solve. I need a tool to check links to broken sites, not a tool to check HTTP error codes. That’s a mechanism that has served the plugin well up to now, but is increasingly longer sufficient. I’d love to have the plugin extended to also check the body text on error code pages so that it doesn’t (necessarily) flag on sites that include text like “captcha-bypass”. This should be pretty easy. Ideally, a more comprehensive solution using something like an automated Chrome driver (or perhaps Selenium crawl) could perhaps be used. But as it is, the plugin is reporting a huge number of false positives, which are no less false positives just because the plugin is seeing HTTP 403 replies.

    • This reply was modified 4 years, 9 months ago by wamcvey.
    • This reply was modified 4 years, 9 months ago by wamcvey.
    Thread Starter jetpackrabbit

    (@jetpackrabbit)

    @wpmudev-support9 I have asked the support team for my host to investigate. They have requested the following:

    To check this further, could you confirm the user-agent or IP address which would be checking your site for broken links and I’ll check for the 403 responses.

    Please let me know and I’ll pass this on.

    Kind Regards

    Kevin

    Plugin Support Williams – WPMU DEV Support

    (@wpmudev-support8)

    Hello @wamcvey

    You are right, these tests fail usually when some sort of “automation” is detected by the target – CloudFlare, Amazon, Pixabay being prime examples. The problem here is that we can’t really fully override it “just like that”. There are two ways of doing that only:

    – one being some sort of “hacking” which is never a good idea and we don’t want to do that too (as in a long run it would cause more troubles for us, as plugin developers, and you as potential “atacker”/”spammer” – as your site could then be considered as such)

    – and the other one being to attempt to “integrate” with such services as much as possible; examples being e.g Amazon which provides some API that could possibly be used for such checks (and we are looking into this already for future) or YouTube.

    Our developers are mainly looking into possible integrations with main services that are causing such detection issues so we hope to get most of those solved over time but they are also exploring other possible ways of “safely” and “in a non-spammin/hacking way” lowering the rate of such 403 responses.

    @jetpackrabbit

    The IP would be the IP of your site/server as Broken Link Checker doesn’t use any external engine – all the connections are made directly from your site/your WordPress installation.

    The user-agent string would in most cases be “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36”.

    Best regards,
    Adam

    Plugin Support Imran – WPMU DEV Support

    (@wpmudev-support9)

    Hello @jetpackrabbit

    I hope you are doing well!

    We haven’t heard back from you for a while now so we’ve marked this ticket as resolved. If you do have any followup questions or require further assistance feel free to reopen it and let us know here.

    Kind regards,
    Nastia

    Thread Starter jetpackrabbit

    (@jetpackrabbit)

    Hi Nastia,

    Sorry for not getting back. It’s now working after intervention by my hosting provider, but see their response in case it’s worth consideration:

    I’ve found the cause here – The loopback connections made by the website to verify the links are using the Chrome user agent they listed, which is essentially user agent spoofing and was being caught by one of our abuse rules as it’s not possible for a Chrome browser to be running inside our web server network.

    You could suggest they have an option to use an identifiable user agent by default (i.e BrokenLinkChecker) rather than pretending to be Chrome, but otherwise I wouldn’t blame them here.

    Hope this is useful

    Plugin Support Nithin – WPMU DEV Support

    (@wpmudevsupport11)

    Hi @jetpackrabbit,

    Thanks for getting back, really appreciate your input. I have brought this into our developer’s attention and we’ll be checking further regarding what could be done to improve this within the plugin side to avoid such use cases.

    Have a nice day ahead.

    Regards,
    Nithin

Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘403 on good links’ is closed to new replies.