• Resolved mwarbinek

    (@mwarbinek)


    This has been posted as an issue a couple of times, but I wanted to add to it to gain some clarification.

    Problem
    Google crawls the website to index pages and URLs. WordFence Live Traffic records URLs visitors who try to access the site using fake URLs, which obviously result in WordPress showing a 404 page because the URL does not exist on the site.

    Example:
    (website domain name)/pond/your-mamas-apron

    Live traffic records this URL as an attempt to access the site and registers this fake URL in the WordPress database.

    Google crawls the website and somehow Google bot locates this fake URL. It identifies it as a crawl error because it resulted in a 404 page, then registers it in the Search Console as a crawl error.

    The odd one may not be a big issue, but someone can really screw up the SEO by doing a spam script that sends hundreds of these fake URLs to the website, getting the URLs registered into Live Traffic stats which Google now lists hundreds as crawl errors. That can create a huge problem trying to address all these crawl errors.

    Live Traffic Stats
    This is a good way to view hackers attempting to hack the site and allows live monitoring to boot. To shut down live traffic should not be an option.

    Possible Solution
    In some of the other posts, a WordFence rep suggested using Search Consoles Parameters settings, in particular “wordfence_lh” (without quotes).

    By default, Search Console sets this parameter as “Let Google Decide”. We can edit this to choose one of 2 choices,

    1. No: Doesn’t affect page content (ex: tracks useage)
    2. Yes: Changes, reorders, or narrows page content

    #2, if selected provides more sub-selections.

    Which one will stop Google from crawling any of the URLs created by the Live Traffic Stats, so any URL’s collected by WordFence from visitors are blocked from Google’s crawl bots?

    (In fact everything about WordFence should never be crawled and registered by Google indexing).

    • This topic was modified 6 years, 6 months ago by mwarbinek. Reason: grammar correction
    • This topic was modified 6 years, 6 months ago by mwarbinek. Reason: fixed sentence
Viewing 6 replies - 1 through 6 (of 6 total)
  • (In fact everything about WordFence should never be crawled and registered by Google indexing).

    I agree. My GSC has dozens of URLs that start with https://www.example.com?wordfence_lh=1&hid=LARGE_RANDOM_NUMBER that are marked as 404s.

    From what I could see, this is more a Google issue than a Wordfence issue. Wordfence adds a script that at one point creates these URLs for Wordfence’s own use (Live Traffic etc). Google should know this URL is pointing nowhere, but as an insatiable “know it all” spider bot, it tries to fetch the non-existing page.

    I just wonder, couldn’t Wordfence change its code to replace the “//www.example.com/” part of these URLs with some website ID, then pass that ID on to Live Traffic engine? Then Google would probably stop considering these as valid URLs and perhaps stop bothering.

    Thread Starter mwarbinek

    (@mwarbinek)

    Here is the alternate point about whose issue it is.

    Google certainly cannot accommodate everything and Google is not responsible for who or how WordFence is coded. Since Google does not own WordFence, what WordFence creates is their own doing and they are responsible for it and all that it does.

    In saying that, WordFence does have a responsibility to code their software so has not to promote conflicts or issues, even issues with Google crawl bots, because that has a negative impact on the user’s website and the site’s SEO.

    The “ID” idea sounds good, maybe WordFence can expand on that and even, maybe, add something to those WordFence generated URL’s that tell bots to stay away, such as the link attribute: rel=”nofollow” that will tell the bot to ignore it.

    (Google Support page on rel=”nofollow” > https://support.google.com/webmasters/answer/96569?hl=en# )

    I believe that WordFence rep needs to tally in on this post string, as I believe it is important they do.

    @mwarbinek,

    Regarding this example:

    (website domain name)/pond/your-mamas-apron

    URLs that do not exist on the website but are still indexed by Google would be indexed because your WordPress theme does not serve 404 response headers correctly. I don’t understand how Wordfence is involved here. Google do not index blocked pages and the only thing Wordfence serves is block pages. Wordfence does not serve 404 pages. That is done by your theme. Wordfence does not “index” pages in the database. It records visits to the site in the wfHits table but this list of visits isn’t shared with Google or anyone else. You are the only person who can see it.

    @cbrandt

    The domain appears in the wordfence_lh request because your domain is actually being requested. It’s not possible to make a request to a site without using it’s actual domain name.

    If you are seeing wordfence_lh getting indexed that would likely be because the wordfence_lh request is returning an unexpected response. If you visit the wordfence_lh URL in a browser, are you seeing something other than a blank page? Sometimes we’ve seen WordPress themes that cause all requests with query strings to redirect to the sites frontpage. So the wordfence_lh requests end up serving the full website instead of the blank page. Another known issue is that the WPX web host blocks all wordfence_lh requests. If you are on WPX hosting, you have to either disable Live Traffic or you can reach out to WPX hosting to have the request whitelisted.

    Thread Starter mwarbinek

    (@mwarbinek)

    I’ll have to take a look with that info, see what I can do.
    Thanks

    Hi @mwarbinek,

    Were you able to fix the issue with your live traffic?

    Please feel free to open another thread to update us if you’re still having issues with Wordfence.

    Thanks!

    Thread Starter mwarbinek

    (@mwarbinek)

    What I did was use a scraper and searched Google indexing for all URLs and keywords relating to my website and none of the annoying words in URLs are showing up, so all seems fine for now.

    I have had no occurrences with “wordfence_lh” that I know of, just the URL types I posted initially above.

    My theme does produce the 404s and that seems to be working fine.

    So if WordFence does not allow Google to index its wfHits table, then that is good.

    Thanks ??

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘WordFence’s Live Traffic URL’s – Creates Google Search Crawl URL Errors’ is closed to new replies.