• Resolved Sergio Alfaro

    (@rafasshop)


    Hello.

    Report number: NDMIBAPI

    My store have less than 700 real urls based on sitemap. This include products, categories, pages, posts, etc.

    When I try to use seo sitemap index on crawler settings, if I click on button to generate it, are generated properly.

    But sometime after when re-generate by itself again this file, are not including the products, only categories, maybe is any timeout while reading product sitemapt due my plugin generate by petition dinamically to provide updated. Strange that only happens when update file by itself.

    So how I can’t fix this, I am trying to use the automatic crawler generation without use my sitemap, but the issue is that now is generating more than 11.800 url’s.

    I checked and I found that is adding all product variations.

    So if main product have this url:
    /camiseta-aida-6656-roly-mujer-tirantes/

    Is adding too all variation inside like this:
    /camiseta-aida-6656-roly-mujer-tirantes/?attribute_pa_color=turquesa&attribute_pa_talla=2xl
    /camiseta-aida-6656-roly-mujer-tirantes/?attribute_pa_color=royal&attribute_pa_talla=2xl
    /camiseta-aida-6656-roly-mujer-tirantes/?attribute_pa_color=roseton&attribute_pa_talla=2xl
    /camiseta-aida-6656-roly-mujer-tirantes/?attribute_pa_color=rojo&attribute_pa_talla=2xl

    By other hand, out of the file, on blacklist I found a lot of strange url’s, like these:
    /?post_type=scheduled-action&p=25657
    /?post_type=acf-field&p=24413

    and others that seems product id’s:
    /20017/
    /20018/
    /20020/
    /20021/
    /20022/
    /20023/
    /20024/
    /20025/

    I tried to use on crawler settings the exclusion content field to don’t include customized content on sitemap, and I added all that appear in the right list, except “products”. In the excluded options is included “product_variation”, but seems that don’t work for me this “exclusion list”.

    Any suggestion?

    Thanks!

Viewing 3 replies - 1 through 3 (of 3 total)
  • Plugin Support qtwrk

    (@qtwrk)

    Hi,

    Yes, crawler will try to get the sitemap with timeout of 15 seconds if your sitemap is dynamic generated , it could hit timeout and result as incomplete URIs

    /litespeed-cache/inc/crawler/crawler.class.php , line 191 , change 15 to a higher number.

    But a more appropriate approach might be you to suggest the sitemap plugin author to make it saved as static files

    For you second part issue.

    Sometimes these URIs are “blocks” that form pages , please try access this URIs and see what page you got , it could be 404 no found or redirection to the homepage.

    Best regards,

    Thread Starter Sergio Alfaro

    (@rafasshop)

    Thank you @qtwrk

    About all url’s are two situations

    1. 404 errors, don’t exist like these:
    /?post_type=scheduled-action&p=25657
    /?post_type=acf-field&p=24413
    /20017/
    /20018/

    2. Variable products, if I open the url exist, due is a product variation, but shouldn’t appear on catalogue or generate this url type, variations don’t appear as product on website.

    I will check all about sitemap. Thanks again.

    Regards.

    Plugin Support qtwrk

    (@qtwrk)

    Hi,

    1, yes , 404 and other error page will be added black-list as there is no point to crawl them.

    2. if it doesn’t appear as product in site , then it probably doesn’t get list into crawler data as well.

    The built-in crawler map generator needs improvement , so 3rd party sitemap is recommended for now.

    Best regards,

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Crawler file generation’ is closed to new replies.