• Resolved pako69

    (@pako69)


    Hi
    (please excuse typos, I’m not English)

    I have several question about this new and great feature:

    #1 Is it always activated? I have not seen any ON/OFF button
    #2 How you plugin find my sitemap?
    #2 Delay: If I understand well this is the crawl delay between each URL?
    #3 Threads: Set up to 3 means that you are using 3 crawls at the same time?
    #4 Run Duration: what is a “craw interval”? By default it is setup to 200s
    #5 Server Load Limit: is is setup to 1 by default, but what “1” means?

    If I asked you all those question it’s because, like a lot of people, I’m on a shared server that don’t like to be overloaded. Before going to you plugin I was using WP Rocket and this one caused a lot of server crash because it was pulling too hard my shared server.

    Thanks

    Edit: I’m using “WP-Cron Events” plugin to manage WP Cron, but I do not see your task?

    • This topic was modified 7 years, 5 months ago by pako69.
Viewing 13 replies - 1 through 13 (of 13 total)
  • Plugin Support Hai Zheng?

    (@hailite)

    Hi Pako69,

    Glad you are excited too :).

    #1 By default it’s off. You can turn it on in “LiteSpeed Cache”->”Crawler”. Under “Activation” there is a beautiful switch button ^_^.

    #2 It will generate the sitemap based on posts and pages in your database following by your permalink setting. If you have any other specified urls, you can use filter litespeed_crawler_sitemap to append them to sitemap.

    #3 That is true, only one thing, its not for each URL but for each URL with threads. e.g. if you have 3 threads currently, it will crawl 3 urls simultaneously, and then sleep for the Delay microseconds. As long as there is only 1 thread, your understanding is right.

    #4 That means each time the crawler will only run as long as 200s, and then exit until next time cron runs.

    #5 That server load is based on linux server load. A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average.

    We designed this server load and dynamic threads setting exactly for the overload issue. When the crawler is running, if the server load is higher than setting, it will stop running automatically. So it won’t cause the server crash issue like the other plugin you mentioned.

    E.g. Assuming settings are below:
    Server load = 5
    Threads = 4
    Server load = 2 when the crawler starts running.

    Here is the process:
    Crawler is crawling 4 urls each time, then it found server load >= 5, it will reduce the threads to 3 and keep crawling. Then overload again, it will reduce threads to 2 and go on. If the thread is only 1 and still overload, it will exit, otherwise it will increase threads one per time then crawl again. Max threads it will be raised to is 4.

    We are always here to answer your questions. It’s our pleasure.

    Cheers

    BTW, you didn’t see the cron due to deactivation in crawler list as mentioned in #1.

    Thread Starter pako69

    (@pako69)

    Hi @hailite thanks for all those explanations ??

    #1 My bad!!! I didn’t see this tab ??
    However, I’m little bit lost to setup correctly, imagine I want the cache to be generated every 72 hours. Where do I setup those 72 hours?
    Because I see a “Crawl Interval” setup to 604800 and a Interval Between Runs setup to 28800 and it seems that the crawler use 28800 to run.

    #2 If I understand, your sitemap is a “fake” sitemap, I mean it will be generated and use only for your own crawl? Google never see it? Maybe is will be (in a next release) to add support to crawl the sitemap index generated by Yoast SEO, don’t you think, because this one contains only what I want to be crawl bar Google, and it means I do not need anything else to be cached (but it’s only my point of view…)

    #5 What setting do you recommanded for a shared server? I Know that they are not all the sames, but just to give an idea…

    Thanks for this wonderfull plugin ! (used with Autoptimze plugin, it’s a great replacement for all plugins and premium plugins ??

    EDIT: I think I do not have setup it correctly… > https://s26.postimg.org/yrqleyqux/Capture_d_e_cran_2017-06-07_a_16.09.04.jpg

    • This reply was modified 7 years, 5 months ago by pako69.
    • This reply was modified 7 years, 5 months ago by pako69.
    Thread Starter pako69

    (@pako69)

    re…
    When I pushed On the beautiful switch button ^_^ and then go to see the cron task, yes I see your cron task but also an error msg: https://s26.postimg.org/y3hqw0s55/cron.jpg
    So I swith it Off and return to the cron list and the error and the error has disappeared

    Plugin Support Hai Zheng?

    (@hailite)

    #1 well the current settings for crawl interval may be a bit misleading. We will change the default value in next hotfix release. If you want whole sitemap be crawled each 72 hours, yes, you need to set that Crawl Interval to 72hours.

    #2 Can’t agree more. The feature that allows users to customize sitemap is already in our development schedule. We will teach the crawler to read Google friendly sitemap.

    #5 As mentioned in #1, they will be changed. However as you have saved the settings once, the default value won’t be used anymore. The new default setting is:
    Run Duration => 400 seconds
    Interval Between Runs => 600 seconds
    Crawl Interval => 302400 seconds (Your 72hours is 259200, should be better)

    Yay, the crawler is nice!

    Just one question: I noticed that my portfolio custom post type is not being added to the crawler sitemap. The CPT is found by LS Cache, it is visible in the “Available Custom Post Types” list on Settings > Crawler, but it’s not in the crwalermap.data file, and generates LiteSpeed Cache Misses.

    Could you give some more infos as to “you can use filter litespeed_crawler_sitemap to append them to sitemap”?

    Thanks, Phil

    Thread Starter pako69

    (@pako69)

    well the current settings for crawl interval may be a bit misleading
    I agree with you, maybe you should keep it the more simple as possible if you want non-tech people to use it.
    > For exemple, just one field:
    Cache purge frequency: xxx (hours/days/months)

    But not all those settings for the Crawler, and for the cache (ttl cache, etc.)

    Just my 2 cents… ??

    Hello and thank you for adding the crawler to your plugin.

    One question please: When manually “ordering” the crawling does the specific page inside the crawler has to stay on that page durring the crawl?

    Could you also please set a guide about best settings for a shared host website?

    Thanking you in advance
    Georgios
    https://www.ango.gr

    Plugin Support Hai Zheng?

    (@hailite)

    @speango

    No you don’t have to stay on that page even you click Manually Run.

    For shared hosting, you can try these:
    Run Duration => 400 seconds
    Interval Between Runs => 600 seconds
    Crawl Interval => 302400 seconds

    keep all others as default.

    Plugin Support Hai Zheng?

    (@hailite)

    @philbee

    https://github.com/litespeedtech/lscache_wp/blob/v1_0_x/litespeed-cache/includes/class-litespeed-cache-crawler-sitemap.php#L136

    When the sitemap is generating, before saving, it will call this filter litespeed_crawler_sitemap. So if any other plugin author or user wants to add a certain url list for crawler, they can use add_filter('litespeed_crawler_sitemap', 'append_your_list_sample_function') to enroll new urls.

    About your portfolio custom post type, we may need more debug info. Please wait for next hotfix release.

    Thread Starter pako69

    (@pako69)

    @hailite
    I have setup like you said for shared server and do a manual run:

    — LiteSpeed Cache Crawler
    The last sitemap crawl began at 06/07/2017 14:03:08
    The next sitemap crawl will start at 06/11/2017 02:03:08

    Ended reason: Stopped due to exceeding defined Maximum Run Time
    Last crawled: 40 item(s)

    So, I understand that the crawl did not finished “his job”, and the next scheduled time it will try will in 4 days ?! Ouchhh

    thank you

    Plugin Support Hai Zheng?

    (@hailite)

    @pako69 That is fine. That 4 days is for a whole new crawling process. The current unfinished process will base on the Run Frequency column.

    @speango You are most welcome.

    • This reply was modified 7 years, 5 months ago by Hai Zheng?.
    Plugin Support LiteSpeed Lisa

    (@lclarke)

    Hi, @pako69 and all!

    Our wiki has been updated to describe all of the crawler settings. Hopefully it can shed some light on any questions you may still have!

    And, as always, if you still find yourself puzzled after taking a look at the wiki, we’d be happy to help you right here ??

    Lisa @ LiteSpeed

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘The great new crawler I was waiting for :)’ is closed to new replies.