• Resolved danny96

    (@danny96)


    No valid sitemap parsed

    We added a new custom rule in your Cloudflare WAF to allow https://admdirect.co.uk/sitemap_index.xml. We then tested the sitemap XML file using LSCache crawler script, and it’s no longer returning 403. It’s able to connect to the sitemap; however, the crawler is unable to see any URLs in the sitemap for some reason

    I have tried to turn drop domain to off.

    I also found code on another ticket to see if you can access the sitemap and it is working >> https://admdirect.co.uk/test.php

    This is all to get the crawler to work.

    Report number: SSASVTPE

Viewing 10 replies - 1 through 10 (of 10 total)
  • Plugin Support qtwrk

    (@qtwrk)

    https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/crawler-map.cls.php#L519

    you can insert a line like below

    error_log(print_r($response,true), 3 , '/path/to/some/logfile.log');

    this should log out what exactly did wordpress retrieved

    Thread Starter danny96

    (@danny96)

    Hi qtwrk,

    I have done this and it logs this:

    text/plain debug.log ( exported SGML document, ASCII text, with CRLF, LF line terminators )
    Array
    (
    [headers] => WpOrg\Requests\Utility\CaseInsensitiveDictionary Object
    (
    [data:protected] => Array
    (
    [date] => Fri, 15 Nov 2024 15:04:21 GMT
    [content-type] => text/xml; charset=UTF-8
    [x-powered-by] => PHP/8.2.24
    [x-dns-prefetch-control] => on
    [set-cookie] => mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict
    [x-robots-tag] => noindex, follow
    [x-litespeed-cache-control] => no-cache
    [cache-control] => no-cache, no-store, must-revalidate, max-age=0
    [vary] => Accept-Encoding
    [alt-svc] => h3=”:443″; ma=86400
    [cf-cache-status] => DYNAMIC
    [report-to] => {“endpoints”:[{“url”:”https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D”}],”group”:”cf-nel”,”max_age”:604800}
    [nel] => {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
    [speculation-rules] => “/cdn-cgi/speculation”
    [server] => cloudflare
    [cf-ray] => 8e30221448a1cd5c-LHR
    [content-encoding] => gzip
    [server-timing] => cfL4;desc=”?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0″
    )

        )
    
    [body] => <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" ?>


    https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
    https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
    https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
    https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
    https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
    https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00

    https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08

    [response] => Array
        (
            [code] => 200
            [message] => OK
        )
    
    [cookies] => Array
        (
            [0] => WP_Http_Cookie Object
                (
                    [name] => mailchimp_landing_site
                    [value] => https://admdirect.co.uk/sitemap_index.xml
                    [expires] => 1734102260
                    [path] => /
                    [domain] => admdirect.co.uk
                    [port] => 
                    [host_only] => 1
                )
    
        )
    
    [filename] => 
    [http_response] => WP_HTTP_Requests_Response Object
        (
            [data] => 
            [headers] => 
            [status] => 
            [response:protected] => WpOrg\Requests\Response Object
                (
                    [body] => <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" ?>


    https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
    https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
    https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
    https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
    https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
    https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00

    https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08

                    [raw] => HTTP/1.1 200 OK

    Date: Fri, 15 Nov 2024 15:04:21 GMT
    Content-Type: text/xml; charset=UTF-8
    Transfer-Encoding: chunked
    Connection: close
    x-powered-by: PHP/8.2.24
    x-dns-prefetch-control: on
    Set-Cookie: mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict
    x-robots-tag: noindex, follow
    x-litespeed-cache-control: no-cache
    Cache-Control: no-cache, no-store, must-revalidate, max-age=0
    vary: Accept-Encoding
    alt-svc: h3=”:443″; ma=86400
    cf-cache-status: DYNAMIC
    Report-To: {“endpoints”:[{“url”:”https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D”}],”group”:”cf-nel”,”max_age”:604800}
    NEL: {“success_fraction”:0,”report_to”:”cf-nel”,”max_age”:604800}
    Speculation-Rules: “/cdn-cgi/speculation”
    Server: cloudflare
    CF-RAY: 8e30221448a1cd5c-LHR
    Content-Encoding: gzip
    server-timing: cfL4;desc=”?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0″



    https://admdirect.co.uk/post-sitemap.xml 2024-08-23T13:18:32+00:00
    https://admdirect.co.uk/page-sitemap.xml 2024-11-12T16:45:35+00:00
    https://admdirect.co.uk/product-sitemap.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/product-sitemap2.xml 2023-11-27T23:24:35+00:00
    https://admdirect.co.uk/product-sitemap3.xml 2024-07-22T14:07:24+00:00
    https://admdirect.co.uk/product-sitemap4.xml 2024-11-15T10:28:04+00:00
    https://admdirect.co.uk/category-sitemap.xml 2024-11-15T15:01:33+00:00
    https://admdirect.co.uk/product_cat-sitemap.xml 2024-11-15T10:28:04+00:00

    https://admdirect.co.uk/pagination-sitemap.xml 2021-12-08

                    [headers] => WpOrg\Requests\Response\Headers Object
                        (
                            [data:protected] => Array
                                (
                                    [date] => Array
                                        (
                                            [0] => Fri, 15 Nov 2024 15:04:21 GMT
                                        )
    
                                    [content-type] => Array
                                        (
                                            [0] => text/xml; charset=UTF-8
                                        )
    
                                    [x-powered-by] => Array
                                        (
                                            [0] => PHP/8.2.24
                                        )
    
                                    [x-dns-prefetch-control] => Array
                                        (
                                            [0] => on
                                        )
    
                                    [set-cookie] => Array
                                        (
                                            [0] => mailchimp_landing_site=https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml; expires=Fri, 13 Dec 2024 15:04:20 GMT; Max-Age=2419200; path=/; secure; SameSite=Strict
                                        )
    
                                    [x-robots-tag] => Array
                                        (
                                            [0] => noindex, follow
                                        )
    
                                    [x-litespeed-cache-control] => Array
                                        (
                                            [0] => no-cache
                                        )
    
                                    [cache-control] => Array
                                        (
                                            [0] => no-cache, no-store, must-revalidate, max-age=0
                                        )
    
                                    [vary] => Array
                                        (
                                            [0] => Accept-Encoding
                                        )
    
                                    [alt-svc] => Array
                                        (
                                            [0] => h3=":443"; ma=86400
                                        )
    
                                    [cf-cache-status] => Array
                                        (
                                            [0] => DYNAMIC
                                        )
    
                                    [report-to] => Array
                                        (
                                            [0] => {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=1sZReS7fP6xHR00tII0o2bXMOkYLb7jIJFlDoKtCW24FG7Os2Adq3A4d9WsCXG3S3dpqEYnzWPGDPTlHggvFiZaene4OBfPI3CtQjvcGBOUK%2Bjc9AEzXRs%2FBYZjiFGsQag%3D%3D"}],"group":"cf-nel","max_age":604800}
                                        )
    
                                    [nel] => Array
                                        (
                                            [0] => {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
                                        )
    
                                    [speculation-rules] => Array
                                        (
                                            [0] => "/cdn-cgi/speculation"
                                        )
    
                                    [server] => Array
                                        (
                                            [0] => cloudflare
                                        )
    
                                    [cf-ray] => Array
                                        (
                                            [0] => 8e30221448a1cd5c-LHR
                                        )
    
                                    [content-encoding] => Array
                                        (
                                            [0] => gzip
                                        )
    
                                    [server-timing] => Array
                                        (
                                            [0] => cfL4;desc="?proto=TCP&rtt=1282&sent=5&recv=7&lost=0&retrans=0&sent_bytes=2849&recv_bytes=795&delivery_rate=2871369&cwnd=253&unsent_bytes=0&cid=b2489258c2b581e6&ts=1417&x=0"
                                        )
    
                                )
    
                        )
    
                    [status_code] => 200
                    [protocol_version] => 1.1
                    [success] => 1
                    [redirects] => 0
                    [url] => https://admdirect.co.uk/sitemap_index.xml
                    [history] => Array
                        (
                        )
    
                    [cookies] => WpOrg\Requests\Cookie\Jar Object
                        (
                            [cookies:protected] => Array
                                (
                                    [mailchimp_landing_site] => WpOrg\Requests\Cookie Object
                                        (
                                            [name] => mailchimp_landing_site
                                            [value] => https%3A%2F%2Fadmdirect.co.uk%2Fsitemap_index.xml
                                            [attributes] => WpOrg\Requests\Utility\CaseInsensitiveDictionary Object
                                                (
                                                    [data:protected] => Array
                                                        (
                                                            [expires] => 1734102260
                                                            [max-age] => 1734102261
                                                            [path] => /
                                                            [secure] => 1
                                                            [samesite] => Strict
                                                            [domain] => admdirect.co.uk
                                                        )
    
                                                )
    
                                            [flags] => Array
                                                (
                                                    [creation] => 1731683061
                                                    [last-access] => 1731683061
                                                    [persistent] => 
                                                    [host-only] => 1
                                                )
    
                                            [reference_time] => 1731683061
                                        )
    
                                )
    
                        )
    
                )
    
            [filename:protected] => 
        )

    )

    so it is finding the different sitemaps

    Thanks

    Plugin Support qtwrk

    (@qtwrk)

    now please try load out that

    $xml_object = simplexml_load_string($response['body'], null, LIBXML_NOCDATA);

    see what it loads and parse into

    Thread Starter danny96

    (@danny96)

    Hi,

    This is what was output

    object(SimpleXMLElement)#8449 (1) { [“sitemap”]=> array(9) { [0]=> object(SimpleXMLElement)#8465 (2) { [“loc”]=> string(40) “https://admdirect.co.uk/post-sitemap.xml&#8221; [“lastmod”]=> string(25) “2024-08-23T13:18:32+00:00” } [1]=> object(SimpleXMLElement)#8464 (2) { [“loc”]=> string(40) “https://admdirect.co.uk/page-sitemap.xml&#8221; [“lastmod”]=> string(25) “2024-11-12T16:45:35+00:00” } [2]=> object(SimpleXMLElement)#8463 (2) { [“loc”]=> string(43) “https://admdirect.co.uk/product-sitemap.xml&#8221; [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [3]=> object(SimpleXMLElement)#8462 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap2.xml&#8221; [“lastmod”]=> string(25) “2023-11-27T23:24:35+00:00” } [4]=> object(SimpleXMLElement)#8461 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap3.xml&#8221; [“lastmod”]=> string(25) “2024-07-22T14:07:24+00:00” } [5]=> object(SimpleXMLElement)#8459 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/product-sitemap4.xml&#8221; [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [6]=> object(SimpleXMLElement)#8456 (2) { [“loc”]=> string(44) “https://admdirect.co.uk/category-sitemap.xml&#8221; [“lastmod”]=> string(25) “2024-11-15T15:19:38+00:00” } [7]=> object(SimpleXMLElement)#8455 (2) { [“loc”]=> string(47) “https://admdirect.co.uk/product_cat-sitemap.xml&#8221; [“lastmod”]=> string(25) “2024-11-15T10:28:04+00:00” } [8]=> object(SimpleXMLElement)#8447 (2) { [“loc”]=> string(46) “https://admdirect.co.uk/pagination-sitemap.xml&#8221; [“lastmod”]=> string(10) “2021-12-08” } } }

    Thanks

    Plugin Support qtwrk

    (@qtwrk)

    so far it looks alright , you may need to keep logging out the variable until the part return count($this->_urls); , see which one it failed to parse

    Thread Starter danny96

    (@danny96)

    Yeah it returns all 9 sitemaps

    Total sitemaps found: 9 is what is output

    Plugin Support qtwrk

    (@qtwrk)

    https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/crawler-map.cls.php#L481

    please try log out this count, see what it gives, I assume it to be 0 or something

    if that so , backtrace alone the way up , for example , log out what is the value of $this->_urls

    around line 460

    Thread Starter danny96

    (@danny96)

    Hi,

    After doing all those logs, it seems that the cloudflare rule the host added was only for the top level.

    So i added to the cloudflare rule to allow every individual sitemap url that sits inside the main one and then the parse worked.

    Thanks

    Plugin Support qtwrk

    (@qtwrk)

    all good now ?

    Thread Starter danny96

    (@danny96)

    Yes, all good

    Thank you for the help

Viewing 10 replies - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.