• Resolved marcusdeman

    (@marcusdeman)


    Our site has been hit by the wordpress search spam where the internal search engine is used to spam with chinese character strings.

    To combat this, in the advanced settings we have turned on the “Filter search terms”, “Filter searches with emojis and other special characters” and “Filter searches with common spam patterns.” We also turned on the “Prevent crawling of internal site search URLs”. However, this option adds ‘disallow’ rules to the robots.txt

    But when we looked at GSC we saw many of these pages being referred to as “Indexed, though blocked by robots.txt” Upon using Google search we are pointed to this page https://support.google.com/webmasters/answer/7440203#indexed_though_blocked_by_robots_txt

    The page mentions the following “Google always respects robots.txt, but this doesn’t necessarily prevent indexing if someone else links to your page. Google won’t request and crawl the page, but we can still index it, using the information from the page that links to your blocked page.”

    Upon looking further in GSC we also see many spam URL’s linking to these internal spam search result pages. So even though these internal urls are not index the spammers are still able to abuse Google as it still uses the information the spammers are linking from. Because of this the tactic the spammers are utilising is still active and abused actively even though the search pages are set to noindex.

    What can we do to stop this behavior altogether?

Viewing 8 replies - 1 through 8 (of 8 total)
  • Plugin Support Mushrit Shabnam

    (@611shabnam)

    Hi @marcusdeman

    Yoast SEO automatically applies a?noindex?directive to your search results page. This keeps these URLs out of Google.

    You mentioned those internal search URLs are not indexed. Google Search Console also reports the URLs as 404s. Though you see related data in Google Search Console, it’s not affecting your SEO.

    Thread Starter marcusdeman

    (@marcusdeman)

    Hi Mushrit,

    Thank you for your response. It’s only because of the robots.txt setting that Google did not crawl these links and did not see the “noindex” being applied to. I have removed the disallow statement from the robots.txt to see if Google picks up the “noindex” tag.

    In GSC I do see these spam results now as “Indexed, though blocked by robots.txt”

    How can this situation be resolved?

    Plugin Support Maybellyne

    (@maybellyne)

    First, what you have in a robots.txt file are crawl redirectives, not indexation directives. And as you have discovered, Google doesn’t always honor that. Also, those were likely already indexed before you turned ON the settings in the Yoast SEO plugin. I don’t think removing the disallow statement is the way to go.

    If you visit the currently indexed spam URLs, do they return a 404? If they do, with time, Google will drop them OFF the index.

    Thread Starter marcusdeman

    (@marcusdeman)

    I can visit the indexed spam URL and they are displaying an actual page on the website. So no 404 is shown.

    If the disallow is put in the robots.txt how does Google determine that the noindex tag is set for these pages as Google is disallowed for crawling these pages.

    Google pages state that it is best to remove these pages from not being indexed by using the noindex but not by disallowing them in robots.txt.

    https://support.google.com/webmasters/thread/197853493/indexed-though-blocked-by-robots-txt?hl=en
    https://support.google.com/webmasters/thread/197435259/how-do-i-get-rid-of-bogus-urls-search-console-found-that-don-t-even-exist-on-my-site?hl=en

    In this article Google also states that using robots.txt is not the way to go for removing these spam searches:
    https://support.google.com/webmasters/answer/9689846

    • This reply was modified 11 months, 1 week ago by marcusdeman.
    Plugin Support Maybellyne

    (@maybellyne)

    If the disallow is put in the robots.txt how does Google determine that the noindex tag is set for these pages as Google is disallowed for crawling these pages.

    The disallow rule is to disallow crawling not indexing. Internal site search pages are set to noindex by default by the Yoast SEO plugin. You shouldn’t need to add noindex directives in robots.txt.

    In this article Google also states that using robots.txt is not the way to go for removing these spam searches:

    The Internal site search cleanup settings in our crawl optimization settings prevent spamming. It doesn’t resolve already indexed URLs. For URLs that are already indexed, you may consider using?the URL removal?tool?in Google Search Console.

    Thread Starter marcusdeman

    (@marcusdeman)

    We did not add noindex directives in the robots.txt. Yoast adds Disallow directives in the robots.txt but will Google still be able to see that these search results pages are set to noindex?

    The removal tool only removes these indexed entries temporarily. We are looking for a permanent solution.

    Plugin Support Maybellyne

    (@maybellyne)

    If you require further assistance, you may consider reaching out to Google in the?Google Search Central Help Community.

    Thread Starter marcusdeman

    (@marcusdeman)

    Thank you for your help Maybellyne

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘Spam search’ is closed to new replies.