No valid sitemap parsed
-
No valid sitemap parsed
We added a new custom rule in your Cloudflare WAF to allow https://admdirect.co.uk/sitemap_index.xml. We then tested the sitemap XML file using LSCache crawler script, and it’s no longer returning 403. It’s able to connect to the sitemap; however, the crawler is unable to see any URLs in the sitemap for some reason
I have tried to turn drop domain to off.
I also found code on another ticket to see if you can access the sitemap and it is working >> https://admdirect.co.uk/test.php
This is all to get the crawler to work.
Report number: SSASVTPE
-
you can insert a line like below
error_log(print_r($response,true), 3 , '/path/to/some/logfile.log');
this should log out what exactly did wordpress retrieved
Hi qtwrk,
I have done this and it logs this:
text/plain debug.log ( exported SGML document, ASCII text, with CRLF, LF line terminators )
Array
(
[headers] => WpOrg\Requests\Utility\CaseInsensitiveDictionary Object
(
[data:protected] => Array
(
[date] => Fri, 15 Nov 2024 15:04:21 GMT
[content-type] => text/xml; charset=UTF-8
[x-powered-by] => PHP/8.2.24
[x-dns-prefetch-control] => on
[set-cookie] => mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict
[x-robots-tag] => noindex, follow
[x-litespeed-cache-control] => no-cache
[cache-control] => no-cache, no-store, must-revalidate, max-age=0
[vary] => Accept-Encoding
[alt-svc] => h3=”:443″; ma=86400
[cf-cache-status] => DYNAMIC
[report-to] => {“endpoints”:[{“url”:”https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D”}],”group”:”cf-nel”,”max_age”:604800}
[nel] => {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
[speculation-rules] => “/cdn-cgi/speculation”
[server] => cloudflare
[cf-ray] => 8e30221448a1cd5c-LHR
[content-encoding] => gzip
[server-timing] => cfL4;desc=”?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0″
)) [body] => <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" ?>
https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08
[response] => Array ( [code] => 200 [message] => OK ) [cookies] => Array ( [0] => WP_Http_Cookie Object ( [name] => mailchimp_landing_site [value] => https://admdirect.co.uk/sitemap_index.xml [expires] => 1734102260 [path] => / [domain] => admdirect.co.uk [port] => [host_only] => 1 ) ) [filename] => [http_response] => WP_HTTP_Requests_Response Object ( [data] => [headers] => [status] => [response:protected] => WpOrg\Requests\Response Object ( [body] => <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" ?>
https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08
[raw] => HTTP/1.1 200 OK
Date: Fri, 15 Nov 2024 15:04:21 GMT
Content-Type: text/xml; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
x-powered-by: PHP/8.2.24
x-dns-prefetch-control: on
Set-Cookie: mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict
x-robots-tag: noindex, follow
x-litespeed-cache-control: no-cache
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
vary: Accept-Encoding
alt-svc: h3=”:443″; ma=86400
cf-cache-status: DYNAMIC
Report-To: {“endpoints”:[{“url”:”https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D”}],”group”:”cf-nel”,”max_age”:604800}
NEL: {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
Speculation-Rules: “/cdn-cgi/speculation”
Server: cloudflare
CF-RAY: 8e30221448a1cd5c-LHR
Content-Encoding: gzip
server-timing: cfL4;desc=”?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0″
https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08
[headers] => WpOrg\Requests\Response\Headers Object ( [data:protected] => Array ( [date] => Array ( [0] => Fri, 15 Nov 2024 15:04:21 GMT ) [content-type] => Array ( [0] => text/xml; charset=UTF-8 ) [x-powered-by] => Array ( [0] => PHP/8.2.24 ) [x-dns-prefetch-control] => Array ( [0] => on ) [set-cookie] => Array ( [0] => mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict ) [x-robots-tag] => Array ( [0] => noindex, follow ) [x-litespeed-cache-control] => Array ( [0] => no-cache ) [cache-control] => Array ( [0] => no-cache, no-store, must-revalidate, max-age=0 ) [vary] => Array ( [0] => Accept-Encoding ) [alt-svc] => Array ( [0] => h3=":443"; ma=86400 ) [cf-cache-status] => Array ( [0] => DYNAMIC ) [report-to] => Array ( [0] => {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D"}],"group":"cf-nel","max_age":604800} ) [nel] => Array ( [0] => {"success_fraction":0,"report_to":"cf-nel","max_age":604800} ) [speculation-rules] => Array ( [0] => "/cdn-cgi/speculation" ) [server] => Array ( [0] => cloudflare ) [cf-ray] => Array ( [0] => 8e30221448a1cd5c-LHR ) [content-encoding] => Array ( [0] => gzip ) [server-timing] => Array ( [0] => cfL4;desc="?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0" ) ) ) [status_code] => 200 [protocol_version] => 1.1 [success] => 1 [redirects] => 0 [url] => https://admdirect.co.uk/sitemap_index.xml [history] => Array ( ) [cookies] => WpOrg\Requests\Cookie\Jar Object ( [cookies:protected] => Array ( [mailchimp_landing_site] => WpOrg\Requests\Cookie Object ( [name] => mailchimp_landing_site [value] => https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml [attributes] => WpOrg\Requests\Utility\CaseInsensitiveDictionary Object ( [data:protected] => Array ( [expires] => 1734102260 [max-age] => 1734102261 [path] => / [secure] => 1 [samesite] => Strict [domain] => admdirect.co.uk ) ) [flags] => Array ( [creation] => 1731683061 [last-access] => 1731683061 [persistent] => [host-only] => 1 ) [reference_time] => 1731683061 ) ) ) ) [filename:protected] => )
)
so it is finding the different sitemaps
Thanksnow please try load out that
$xml_object = simplexml_load_string($response['body'], null, LIBXML_NOCDATA);
see what it loads and parse into
Hi,
This is what was outputobject(SimpleXMLElement)#8449 (1) { [“sitemap”]=> array(9) { [0]=> object(SimpleXMLElement)#8465 (2) { [“loc”]=> string(40) “https://admdirect.co.uk/post-sitemap.xml” [“lastmod”]=> string(25) “2024-08-23T13:18:32+00:00” } [1]=> object(SimpleXMLElement)#8464 (2) { [“loc”]=> string(40) “https://admdirect.co.uk/page-sitemap.xml” [“lastmod”]=> string(25) “2024-11-12T16:45:35+00:00” } [2]=> object(SimpleXMLElement)#8463 (2) { [“loc”]=> string(43) “https://admdirect.co.uk/product-sitemap.xml” [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [3]=> object(SimpleXMLElement)#8462 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap2.xml” [“lastmod”]=> string(25) “2023-11-27T23:24:35+00:00” } [4]=> object(SimpleXMLElement)#8461 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap3.xml” [“lastmod”]=> string(25) “2024-07-22T14:07:24+00:00” } [5]=> object(SimpleXMLElement)#8459 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap4.xml” [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [6]=> object(SimpleXMLElement)#8456 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/category-sitemap.xml” [“lastmod”]=> string(25) “2024-11-15T15:19:38+00:00” } [7]=> object(SimpleXMLElement)#8455 (2) { [“loc”]=> string(47) “https://admdirect.co.uk/product_cat-sitemap.xml” [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [8]=> object(SimpleXMLElement)#8447 (2) { [“loc”]=> string(46) “https://admdirect.co.uk/pagination-sitemap.xml” [“lastmod”]=> string(10) “2021-12-08” } } }
Thanksso far it looks alright , you may need to keep logging out the variable until the part
return count($this->_urls);
, see which one it failed to parseYeah it returns all 9 sitemaps
Total sitemaps found: 9 is what is outputplease try log out this count, see what it gives, I assume it to be 0 or something
if that so , backtrace alone the way up , for example , log out what is the value of
$this->_urls
around line 460
Hi,
After doing all those logs, it seems that the cloudflare rule the host added was only for the top level.
So i added to the cloudflare rule to allow every individual sitemap url that sits inside the main one and then the parse worked.
Thanksall good now ?
Yes, all good
Thank you for the help
- You must be logged in to reply to this topic.