• Not sure how many here are familiar with TopBuzz, but it’s essentially a Reddit-esque app/website that people use for sharing content. It refers tons of traffic to publishers via the nativeapp.touitao.com domain.

    Over the past several months, I’ve noticed an absolutely massive surge in traffic from TopBuzz. It turns out that someone posing as my site was sharing links to all of my content on the platform. The process seemed to be automated, as if they were just syndicating an RSS feed (I’m honestly not sure HOW to do this in TopBuzz, but it’s the only explanation I have).

    Suddenly, the traffic stopped. I noticed that instead of posting links to my site, the user is now just posting entire article content (right down to image captions) natively to the TopBuzz platform. They seem to be monetizing this content … without permission from me.

    In reviewing the posts, I’m more convinced than ever that the scraping is automated. Which means I theoretically should be able to prevent it. But I’m not sure how. I tried switching the RSS Home Feed to “excerpt” instead of “full article content,” but that didn’t work.

    Are there individual RSS files for the articles that I could potentially modify?

    TopBuzz isn’t being very cooperative, so I’m going to be pursuing legal action. But I wanted to at least block the scraping in the interim.

    • This topic was modified 5 years, 7 months ago by thedevplayer.
Viewing 2 replies - 1 through 2 (of 2 total)
  • Leave your RSS feed set to ‘summary’ as that will help some.

    You can also block whatever app is scraping your website via the IP address if they are actually scraping the full articles.

    If someone is pulling and syndicating just your RSS feed, I’d let that ride as long as they are publishing just the summary. That will bring you more traffic when those readers hit your ‘Read More’ link.

    It’s also possible Top Buzz has an app to easily repost articles such as WordPress does with the ‘Press This’ bookmarklet or the ‘Press Forward’ curation bookmarklet that is an integral part of the Press Forward system.

    I’d do what I could to encourage the well-behaved syndication of my posts. That brings in visitors you may have never reached otherwise.

    Stealing your content via scraping methods and badly behaved systems should be discouraged at the source. Ip blocking is one way to discourage those activities.

    One of my dreams is a day when the future blogosphere will consist of original authors and content aggregators who share snippets of complete articles out to each other for the benefit of the readers and each website.

    The tools are here!

    Thread Starter thedevplayer

    (@thedevplayer)

    Just to clarify, the user was originally just syndicating links to my site. I loved that – the traffic boost was welcome. The user is now actually scraping the whole articles. There is a “view original article” in tiny print at the bottom, but it’s completely worthless from a referral standpoint (I’m not even sure there’s an SEO benefit since that link only shows up within the TopBuzz app). And it’s unacceptable for them to be monetizing my content without permission.

    I honestly cannot figure out how to do this kind of content syncing in TopBuzz, which makes me think that this person somehow convinced them he was a legitimate representative of my site and received heightened syndication privileges.

    But, the point being, I want to see if there’s anything I could edit in WordPress that would make it impossible for them to scrape more than just the excerpt.

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Preventing People From Scraping My Article Content (ie TopBuzz)’ is closed to new replies.