• Resolved Sergio Alfaro

    (@rafasshop)


    Hello!

    I generated a report to you with this code: ZZRJVBNX

    All the url’s that scan the crawler are added to the blacklist in first time, so stops to continue crawling the website.

    As I can read in the description, this list is generated in basis to all url’s that have the tag no-cache.

    I don’t setup nothing related with that and I can’t found anything regarding that.

    Where I can setup what url’s should have or not the “no-chache” tag?

    Thank you!

Viewing 5 replies - 1 through 5 (of 5 total)
  • Hello,

    From Litespeed cache -> Settings -> Crawler see if “HTTP/2 Crawl” is enable and if so, please disable it and re-run the crawler.

    Thread Starter Sergio Alfaro

    (@rafasshop)

    Hi Tishu! Thank you so much for your very fast reply!

    Seems that is working now, I think is not adding more url’s to the blacklist.

    I say “seems” due for any reason now stops the crawling:

    Start watching…
    18 Apr 2018 11:04:47 Size: 419 Crawler: #1 Position: 370 Threads: 3 Status: crawling, updated position
    …….

    Is like is running, but all time is fixed in that.

    We have 32 core server and litespeed license for 4 cores. So I think 3 threads should be fine.

    Checking server resources while is crawling the CPU usage the “top” change from 3 to 6 of the 32 available cores, and from 5% to 10% of all the available CPU.

    Is not a problem with the RAM neither due we have 192Gb.

    Maybe is due to any of my settings limitation?

    One thing to have in consideration, I added 3 additional ID user roles to crawler, but stay on first one in 370 of 419 url.

    Thanks!

    Hello again,

    You can check the options in Litespeed cache -> Settings -> Crawler and edit them depend on the server load.

    Thread Starter Sergio Alfaro

    (@rafasshop)

    Thanks!

    Yes I do it. But I am not sure about the good values to put here.

    I will play with all the options.

    Thanks again!

    Plugin Support Hai Zheng?

    (@hailite)

    According to this issue, we set that HTTP/2 Crawl default to OFF instead of ON for new installations.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Crawler is adding all urls to the blacklist’ is closed to new replies.