crawler always shows blue (missing)
-
report number is: WTOUJIQV
No matter how many times I tried to start the crawlers manually, they always finished in a very short time and showed blue icons (missing)
I also have a debug log file, but I don’t know how to append this file
-
I see 2 IPs there , xx.xxx.146.143 and xxx.xxx.44.106 , which one is the origin server ?
make sure in General -> Server IP , it is set with correct IP
Thank you for your response.
xxx.xxx.44.106 is the origin server IP. I have seen it in the report.
LSCache Plugin Options _version = 6.0.0.1 hash = JrfxxxxxxxxxxxxxxxxxxxYlDs3bQtI auto_upgrade = true api_key = B819xxxxxxxxxxxxxxxxDE8 server_ip = xxx.xxx.44.106 guest = false guest_optm = false news = true guest_uas = array ( 0 => 'Lighthouse', 1 => 'GTmetrix', 2 => 'Google', 3 => 'Pingdom', 4 => 'bot', 5 => 'PTST', 6 => 'HeadlessChrome', )
xxx.xxx.146.143 is wrong. (In fact, it seems to be the IP address I used two years ago), and I haven’t found it in the report. Would you be pleased to let me know where do you notice this IP?
- This reply was modified 9 months, 3 weeks ago by billzt.
try set crawler interval to 600 , go to sitemap setting , set
drop domain
to off , refresh sitemaprun the crawler continuously twice , after first time it finishes , wait for 10 minutes and run it again , see what it shows
My sitemap contains 381 URLs in total.
In this test, I found that every time, the first 57 URLs are in green (hit), while all others are in blue (missing)
However, even for URLs that are shown in blue, I found the result of
curl -I
had shownx-litespeed-cache: hit
It should be noted that my website is using the Cloudflare CDN, and this issue began when I setup a new server and copied all the old files in the wordpress (of course, I didn’t copy the directory where cache data located). In the old server, the crawlers work well without setting the “drop domain”
- This reply was modified 9 months, 3 weeks ago by billzt.
curl -I
won’t work as crawler , you need to mimic full chrome desktop or mobile header that includesaccept-encoding
,user-agent
andaccept
header to mimic crawler’s actionbut did it work with drop domain setting ?
I’m not sure. But currently my drop domain is set to off, and the domain is included in the URLs.
I tried to initiate the crawlers manually, but they finished quickly within 1 second.
Start watching... 06 Jan 2024 10:17:45 Size: 381 Crawler: #1 Position: 1 Threads: 1 Status: crawling, prepare running 06 Jan 2024 10:18:28 Size: 381 Crawler: #1 Position: 1 Threads: 1 Status: end ..
No matter how many times I manually started the crawlers, there are always the first 63 URLs in green, and the remaining 318 URLs in blue. Why?
try set drop dowmin to ON , and in general -> server IP , make sure it is correct one , then go to toolbox -> debug setting -> enable debug log , run crawler , then in “log view” tab , click “crawler log” , see what crawler received
If I set drop dowmin to ON, then all the URLs are blue (missing).
If I set drop dowmin to OFF, then 57 URLs are green, others are blue.
My domain is using the Cloudflare CDN, is this the reason why it behaves so strange?
it could, but when drop ON , it should bypass CF and directly connect to origin IP
and that’s why I asked you to check crawler log , see what crawler received in response.
Well, the log is like this. All missing.
I replaced the real IP address with xxx.xxx
However it should be noticed that my origin server hasn’t set any SSL certificates. I just reply on CF. It means that if I closed CF, users cannot visit my URLs such as https://springwood.me/hello-world/ . Is this the reason and should I set SSL certificates on my origin server?
01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] [Router] parsed type: crawler_force 01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ? type=crawler_force 01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ??? ------------async-------------start_async_handler 01/06/24 09:38:47.341 [xxx.xxx.44.106:28120 1 DbU] ??? ......crawler manually ran...... 01/06/24 09:38:47.348 [xxx.xxx.44.106:28120 1 DbU] ??? Init w/ CPU cores=2 01/06/24 09:38:47.348 [xxx.xxx.44.106:28120 1 DbU] ??? ......crawler started...... 01/06/24 09:38:47.354 [xxx.xxx.44.106:28120 1 DbU] ??? Server load: 0.69970703125 01/06/24 09:38:47.362 [xxx.xxx.44.106:28120 1 DbU] ??? ini_get max_execution_time=30 01/06/24 09:38:47.362 [xxx.xxx.44.106:28120 1 DbU] ??? ini_set max_execution_time=600 01/06/24 09:38:47.363 [xxx.xxx.44.106:28120 1 DbU] ??? final max_execution_time=600 01/06/24 09:38:47.425 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hello-world/ [ori] /hello-world/ 01/06/24 09:38:47.426 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hello-world/ 01/06/24 09:38:47.426 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/got-yongqi/ [ori] /got-yongqi/ 01/06/24 09:38:47.427 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /got-yongqi/ 01/06/24 09:38:47.428 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/2002-2002/ [ori] /2002-2002/ 01/06/24 09:38:47.428 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /2002-2002/ 01/06/24 09:38:47.435 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/qq-360/ [ori] /qq-360/ 01/06/24 09:38:47.436 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /qq-360/ 01/06/24 09:38:47.436 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/nanjing-tree/ [ori] /nanjing-tree/ 01/06/24 09:38:47.437 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /nanjing-tree/ 01/06/24 09:38:47.438 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-100days/ [ori] /blog-100days/ 01/06/24 09:38:47.438 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-100days/ 01/06/24 09:38:47.439 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/mouse-middle-button/ [ori] /mouse-middle-button/ 01/06/24 09:38:47.439 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /mouse-middle-button/ 01/06/24 09:38:47.440 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/update-ubuntu-after-spring-festival/ [ori] /update-ubuntu-after-spring-festival/ 01/06/24 09:38:47.440 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /update-ubuntu-after-spring-festival/ 01/06/24 09:38:47.441 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/mobile-j108i/ [ori] /mobile-j108i/ 01/06/24 09:38:47.442 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /mobile-j108i/ 01/06/24 09:38:47.442 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/parallel-sentences/ [ori] /parallel-sentences/ 01/06/24 09:38:47.444 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /parallel-sentences/ 01/06/24 09:38:47.444 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/ching-ming/ [ori] /ching-ming/ 01/06/24 09:38:47.445 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /ching-ming/ 01/06/24 09:38:47.446 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/tancheng-luzhou-earthquake-line/ [ori] /tancheng-luzhou-earthquake-line/ 01/06/24 09:38:47.446 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /tancheng-luzhou-earthquake-line/ 01/06/24 09:38:47.447 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/univeristy-entrance-exam-2006/ [ori] /univeristy-entrance-exam-2006/ 01/06/24 09:38:47.447 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /univeristy-entrance-exam-2006/ 01/06/24 09:38:47.448 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/biopython-dbfetch/ [ori] /biopython-dbfetch/ 01/06/24 09:38:47.448 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /biopython-dbfetch/ 01/06/24 09:38:47.449 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/qq-webqq/ [ori] /qq-webqq/ 01/06/24 09:38:47.449 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /qq-webqq/ 01/06/24 09:38:47.450 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/jian-guo-yun/ [ori] /jian-guo-yun/ 01/06/24 09:38:47.450 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /jian-guo-yun/ 01/06/24 09:38:47.451 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-suffer-hack/ [ori] /blog-suffer-hack/ 01/06/24 09:38:47.451 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-suffer-hack/ 01/06/24 09:38:47.452 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/cpan/ [ori] /cpan/ 01/06/24 09:38:47.453 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /cpan/ 01/06/24 09:38:47.453 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/tutor/ [ori] /tutor/ 01/06/24 09:38:47.454 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /tutor/ 01/06/24 09:38:47.454 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/human-phylo/ [ori] /human-phylo/ 01/06/24 09:38:47.455 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /human-phylo/ 01/06/24 09:38:47.455 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/trinityigv-blat/ [ori] /trinityigv-blat/ 01/06/24 09:38:47.456 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /trinityigv-blat/ 01/06/24 09:38:47.456 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/install-r-packages/ [ori] /install-r-packages/ 01/06/24 09:38:47.457 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /install-r-packages/ 01/06/24 09:38:47.457 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/blog-suffer-ddos/ [ori] /blog-suffer-ddos/ 01/06/24 09:38:47.458 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /blog-suffer-ddos/ 01/06/24 09:38:47.459 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/rss-reader-choice/ [ori] /rss-reader-choice/ 01/06/24 09:38:47.459 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /rss-reader-choice/ 01/06/24 09:38:47.460 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/codon-fold/ [ori] /codon-fold/ 01/06/24 09:38:47.460 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /codon-fold/ 01/06/24 09:38:47.461 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hotmail-breakdown/ [ori] /hotmail-breakdown/ 01/06/24 09:38:47.461 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hotmail-breakdown/ 01/06/24 09:38:47.462 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/bioinformatics-windows-pc/ [ori] /bioinformatics-windows-pc/ 01/06/24 09:38:47.462 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /bioinformatics-windows-pc/ 01/06/24 09:38:47.463 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/sun-zhencai-1/ [ori] /sun-zhencai-1/ 01/06/24 09:38:47.463 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /sun-zhencai-1/ 01/06/24 09:38:47.464 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/sun-zhencai/ [ori] /sun-zhencai/ 01/06/24 09:38:47.465 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /sun-zhencai/ 01/06/24 09:38:47.465 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/shenzhen-visa/ [ori] /shenzhen-visa/ 01/06/24 09:38:47.466 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /shenzhen-visa/ 01/06/24 09:38:47.466 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/office-tab-plugin/ [ori] /office-tab-plugin/ 01/06/24 09:38:47.467 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /office-tab-plugin/ 01/06/24 09:38:47.468 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/textbook-math/ [ori] /textbook-math/ 01/06/24 09:38:47.468 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /textbook-math/ 01/06/24 09:38:47.469 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/typhoon-shanzu/ [ori] /typhoon-shanzu/ 01/06/24 09:38:47.469 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /typhoon-shanzu/ 01/06/24 09:38:47.470 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/2018-final/ [ori] /2018-final/ 01/06/24 09:38:47.470 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /2018-final/ 01/06/24 09:38:47.471 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/hongkong-qman-enquiry/ [ori] /hongkong-qman-enquiry/ 01/06/24 09:38:47.471 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /hongkong-qman-enquiry/ 01/06/24 09:38:47.472 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/bing-break-down/ [ori] /bing-break-down/ 01/06/24 09:38:47.479 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /bing-break-down/ 01/06/24 09:38:47.480 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/cash-withdraw-overseas/ [ori] /cash-withdraw-overseas/ 01/06/24 09:38:47.481 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /cash-withdraw-overseas/ 01/06/24 09:38:47.482 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/wikipedia-breakdown/ [ori] /wikipedia-breakdown/ 01/06/24 09:38:47.482 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /wikipedia-breakdown/ 01/06/24 09:38:47.483 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/oligodb/ [ori] /oligodb/ 01/06/24 09:38:47.483 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /oligodb/ 01/06/24 09:38:47.484 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/centos-7-udisk-install/ [ori] /centos-7-udisk-install/ 01/06/24 09:38:47.484 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /centos-7-udisk-install/ 01/06/24 09:38:47.485 [xxx.xxx.44.106:28120 1 DbU] ??? Crawling [url] https://springwood.me/buy-house-shenzhen/ [ori] /buy-house-shenzhen/ 01/06/24 09:38:47.485 [xxx.xxx.44.106:28120 1 DbU] ??? [status] ?? Miss [url] /buy-house-shenzhen/
In the logs, all records are shown as missing.?
However it should be noticed that my origin server hasn’t set any SSL certificates. I just reply on CF. It means that if I closed CF, users cannot visit my URLs such as https://springwood.me/hello-world/ . Is this the reason and should I set SSL certificates on my origin server?
oh , now that makes sense , yes , you will need a valid cert on your origin
Hi qtwrk, now it works. Thank you.
So I recommend to update the document and add a note that SSL certificate on raw server is necessary even if using CF.
glad to know it works now , will advice our doc team about it
- The topic ‘crawler always shows blue (missing)’ is closed to new replies.