I can’t get the crawler to work, I’ve tried lowering the delay etc but it seems to be stuck. I’ve tried a few of the other suggestions i’ve read in the forum but to no success. In the crawler status it just says start watching. The service code is: AFDGWKTI
Thanks
]]>We added a new custom rule in your Cloudflare WAF to allow https://admdirect.co.uk/sitemap_index.xml. We then tested the sitemap XML file using LSCache crawler script, and it’s no longer returning 403. It’s able to connect to the sitemap; however, the crawler is unable to see any URLs in the sitemap for some reason
I have tried to turn drop domain to off.
I also found code on another ticket to see if you can access the sitemap and it is working >> https://admdirect.co.uk/test.php
This is all to get the crawler to work.
Report number: SSASVTPE
# Enable crawlers at 8:00 PM NZDT (7:00 AM UTC) with logging
0 7 * * * wp litespeed-crawler list --path=/var/www/html | grep -oE '^[0-9]+' | xargs -I {} wp litespeed-crawler enable {} --path=/var/www/html && wp litespeed-crawler r --path=/var/www/html >> /var/www/html/wp-content/lscronlog.txt 2>&1
# Disable crawlers at 5:00 AM NZDT (4:00 PM UTC) with logging
0 16 * * * wp litespeed-crawler list --path=/var/www/html | grep -oE '^[0-9]+' | xargs -I {} wp litespeed-crawler disable {} --path=/var/www/html >> /var/www/html/wp-content/lscronlog.txt 2>&1
it is enabling and disabling the crawlers as scheduled it is working fine but when it comes to running it even after trying with litespeed-crawler r or litespeed-crawler run, all i get is
Success: Start crawling. Current crawler #1 [position] 0 [total] 1884
I need to go back into the website and manually click the “Run Crawler” button. only then will it actually start processing as usual. Report number DSRERPTK
]]>I am experiencing an issue with LiteSpeed Cache Crawler not parsing my sitemap correctly. When I try to Refresh Crawler Map, I receive the error:
“No valid sitemap parsed for crawler.”
In addition, the Sitemap List displays a Sitemap Total: 0.
Details of the Issue:
<loc>
Attempts to Resolve:
.htaccess
Request: Could you please assist with diagnosing the root cause of this issue? Any specific guidance on how to configure the crawler to parse the sitemap correctly would be very helpful.
]]>While crawling product pages would be okay, they are crawling product filters pages, thus overloading the database with these requests. Sample of request page is: .com/?filtering=1&filter_product_cat=740,3615,718,2493 etc
How to limit the request volume without directly blocking the FB bots?
]]>But when ESI is enabled, even with freshly Purged cache (0 page loads, site is private) all of the Crawler page statuses are Hit (Green = Already Cached). Long loading + cache timestamp in page source confirms that page was not cached by crawler before.
When ESI is disabled – crawler works normally with pages going to Miss (Blue = Successfully Crawled).
Report number:?ETKGDAGP
PS. I do see something in logs
Server/Wordpress logs:
First time when visiting page manually: ESI buffer empty /path/
\
Crawler logs (https://justpaste.it/fhffz) : 09/18/24 09:53:21.132 [213.227.132.36:41126 1 KYP] Redis encountered a fatal error: OOM command not allowed when used memory > 'maxmemory'. (code: 0)
Some that consist of hit in Crawler log:
09/18/24 09:53:20.633 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js files excluded [hit] jquery.min.js
09/18/24 09:53:20.633 [213.227.132.36:41126 1 KYP] [Util] external
09/18/24 09:53:20.633 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js files excluded [hit] stats.wp.com
09/18/24 09:53:20.633 [213.227.132.36:41126 1 KYP] [Util] external
09/18/24 09:53:20.634 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js files excluded [hit] gtag
09/18/24 09:53:20.637 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js excluded [hit] gtag
09/18/24 09:53:20.638 [213.227.132.36:41126 1 KYP] [Util] external
09/18/24 09:53:20.638 [213.227.132.36:41126 1 KYP] [Util] external
09/18/24 09:53:20.638 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js files excluded [hit] stats.wp.com
09/18/24 09:53:20.638 [213.227.132.36:41126 1 KYP] [Optm] inline js defer excluded [setting] _stq
09/18/24 09:53:20.638 [213.227.132.36:41126 1 KYP] [Optm] _parse_js bypassed due to js excluded [hit] _stq
Debug log – https://justpaste.it/eebw6
]]>I am getting the following a lot, either from the same ip address or different ones:
“87.247.158.120
/wp-admin/admin-ajax.php?action=async_litespeed&litespeed_type=crawler
United Arab Emirates
Backend”
And this is being blocked by my iQ Block plugin; Should this be allowed or is it another phishing attempt?
Thank you
]]>Thank you very much.
]]>I can’t get LiteSpeed Cache to serve cached content for all my pages.
After setting up the LiteSpeed Cache crawlers, they run normally, but in the “Sitemap List” section, I see the crawler status for each URL in green, indicating a 201 response code. I understand that a 201 Created code indicates that a new resource has been created as a result of the request, and this resource is generally not directly cached. However, the response could contain information about the created resource, and if that resource is accessed later with a GET request, then the response for that new request could be cached.
When I load any URL on the website for the first time, I see Status Code: 200 OK and x-litespeed-cache: miss.
When I reload that URL, I then see x-litespeed-cache: hit.
Interestingly, when I load the URLs with the Brave browser, this doesn’t happen, and for all URLs, I can see x-litespeed-cache: hit.
This is a WordPress / WooCommerce installation on a server with elastic hosting and 6 CPU cores. In the .htaccess file, I have:
<IfModule LiteSpeed>
CacheEngine on esi crawler
</IfModule>
But I’m not sure if something else is needed or if this setup is really correct.
Has anyone encountered a similar problem?
]]>