• Resolved raymond621

    (@raymond621)


    I don’t know what triggers this problem but I have already disabled guest list. Crawling for 4629 pages takes more than a day to complete. A complete cycle would usually take 3 hours roughly. When the next cycle comes up, it will recrawl all the missing pages. This process repeats for many hours, almost a day until all pages are crawled. I have already reduced server load to 1 and use 3 threads for the default values. And time out is increased to 60 seconds. Why does the crawler need to take several cycles to complete crawling the entire site?

    Another problem is that I cannot enabled manual crawling. After clicking it, it will say “Start Watching” under “Watch ccrawler status”. But when the page refreshes after clicking manually run, it stops running again.

Viewing 6 replies - 1 through 6 (of 6 total)
  • Why does the crawler need to take several cycles to complete crawling the entire site?

    This is not a cycle. Depending on your settings (Guest Mode ?, webp ?, Mobile View ?) the crawler must crawl each URL as much crawlers are defined. That means, if you have Guest Mode, webp and Mobile View enabled each URL must be crawled 8 times = 37032 URLs and not only 4629.

    Another problem is that I cannot enabled manual crawling. After clicking it, it will say “Start Watching” under “Watch ccrawler status”. But when the page refreshes after clicking manually run, it stops running again.

    Press the button twice.

    Thread Starter raymond621

    (@raymond621)

    re: Why does the crawler need to take several cycles to complete crawling the entire site?

    I do not have webp and guest mode enabled. So there is only one crawl job available. After going through one cycle, it will run again in an hour and repeats itself. So some of the URL’s are missed during the furst cycle. But this repeats itself for almost a day until the site is completely crawled.

    Take a look at the crawler documentation. There you will find a description about “Run Duration” setting. This setting can cause what you describe.

    https://docs.litespeedtech.com/lscache/lscwp/crawler/

    Thread Starter raymond621

    (@raymond621)

    Are you talking about the crawl interval?

    I have already set it to 1800 seconds to run every half hour. Up to my understanding, the crawler will not start a cycle every 1800 seconds if it knows the current crawler is still in progress. Timeout is now increased to 100 seconds which is more than enough. I checked my pages using chrome debug, it takes around 10 – 15 seconds to open a page without cache.

    So why are there so many cache misses which requires multiple crawler cycle to complete the entire site?

    What you say about your crawler settings fuels the suspicion that these are not only wrong, but that your server is also overwhelmed by them. If you say that it would take about 3 hours for all URLs to be crawled, but you set an interval of 1800 seconds (0.5h), then it doesn’t take a calculator to realize that the interval is set way too short.

    What is much more dramatic, however, is that it takes up to 15 seconds for an uncached page to load. You don’t have a crawler problem because of this, but the low performance of your server is your fundamental problem. With so many URLs, you should consider upgrading your server, since you cannot make any correct crawler settings with your current hosting, precisely because your server is completely overloaded with the amount of data.

    Plugin Support qtwrk

    (@qtwrk)

    So why are there so many cache misses which requires multiple crawler cycle to complete the entire site?

    you mean like some URLs in the middle that be miss after the crawler ?

    did you enable option like UCSS/CCSS/LQIP/VPI ?

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Cannot manually crawl’ is closed to new replies.