• Resolved nzt1012

    (@nzt1012)


    I am trying to scrape copyright free public domain content from a vast network of old websites (circa 2003) to a single new website on my localhost. The network of old sites has a little over 40,000 posts in total spread across 12 domains. And the post are only written content and images, no videos at all.

    When I started scraping at first, the posts got scraped really fast at around 100 posts/10 min. But as the total post number kept increasing, the scraping speed got slower.

    After completing 10,000 posts, the speed reduced to 200 posts/hour which is still doable. But now that I have posted 25,000 posts, the scraping speed is down to 200 posts/day!

    I still have about 20,000 posts left to add and it is not possible for me to continue at such slow pace.

    I created a fresh second site on my localhost and tried scraping to see if it was my WAMP that has slowed things down. But no, it has not. Newer sites scrape just as fast as the main site did at the beginning.

    As an alternate solution, I tried creating posts on a second site and importing from its XML to my main site but importing through the XML file is just as slow and the posts often mess up the featured images. Importing content through XML on any site other than the main site takes less than a few minutes for me otherwise. So I am pretty sure it is only the main site I’m working on that has slowed down immensely

    I tried using WP-optimize to optimize the tables and database for the main site but that has not helped at all. I’ve tried deleting cache using Ctrl + F5 but that has not helped either.

    Yesterday I also tried to defer the term counting and comment counting in wp-includes but thats has not shown any improvements as well. I tried posting a self new post from the backend based on a forum moderators recommendation and it got added within seconds without any delays or problems.

    I did expect the site to get slower as I added more content but not this slow. And it seems only adding new posts, whether it be through scraping or importing is affected. The dashboard, site, navigation and everything else even though a bit slow is working alright. So if someone could help me figure out how I can make adding new content through scraping just as fast as it was the beginning or even half as fast, I would be very thankful.

Viewing 3 replies - 1 through 3 (of 3 total)
  • Thread Starter nzt1012

    (@nzt1012)

    Hello. I would kindly like to disagree with you.

    I tried most things I can for someone who can’t read code.I also changed the msql tables to innodb to myisam and vice versa, changed values in php.ini and mysql setting etc, cleaned, repaired and optimized all tables too

    I feel it is a plugin problem because the plugin is calling some function, query or data that is taking too long due to the number of posts or its a cache/buffer issue faced by the plugin. If it was a database or media library problem, the rest of the site wouldnt work perfectly. I also tried scraping on fresh sites and it works perfectly. So I am pretty sure the reason the plugin has slowed down on my main site has something to do with its interaction with the database.

    Thread Starter nzt1012

    (@nzt1012)

    Hi. I figured out the problem. There were a couple of plugins on my end that were slowing things down. The moment I disabled all resource hogging plugins, the speed was back to normal.

    Plugin Author Rico Macchi

    (@rico-macchi)

    I am happy to hear you figured it out

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Adding posts via scraping getting slower and slower each day’ is closed to new replies.