Spam search
-
Our site has been hit by the wordpress search spam where the internal search engine is used to spam with chinese character strings.
To combat this, in the advanced settings we have turned on the “Filter search terms”, “Filter searches with emojis and other special characters” and “Filter searches with common spam patterns.” We also turned on the “Prevent crawling of internal site search URLs”. However, this option adds ‘disallow’ rules to the robots.txt
But when we looked at GSC we saw many of these pages being referred to as “Indexed, though blocked by robots.txt” Upon using Google search we are pointed to this page https://support.google.com/webmasters/answer/7440203#indexed_though_blocked_by_robots_txt
The page mentions the following “Google always respects robots.txt, but this doesn’t necessarily prevent indexing if someone else links to your page. Google won’t request and crawl the page, but we can still index it, using the information from the page that links to your blocked page.”
Upon looking further in GSC we also see many spam URL’s linking to these internal spam search result pages. So even though these internal urls are not index the spammers are still able to abuse Google as it still uses the information the spammers are linking from. Because of this the tactic the spammers are utilising is still active and abused actively even though the search pages are set to noindex.
What can we do to stop this behavior altogether?
- The topic ‘Spam search’ is closed to new replies.