• Resolved jessejones

    (@jessejones)


    I was thinking that a useful feature enhancement might be being able to declare your Sitemap URL for generating pages.

    A common concern that has been raised before in this forum is the time it takes to generate a site. By using the sitemap, in theory, it should reduce the need the for unnecessary crawls when generating the static site and should lower the total crawl time since the total URLs/files being fetched when be known in advance.

Viewing 2 replies - 1 through 2 (of 2 total)
  • Plugin Support Igor Benic

    (@ibenic)

    Interesting idea, could be used in case it missed a few pages to find.

    As for crawling, the system goes from the homepage and then follows links that are found there. Then, for each link, we check if it’s a page or asset and crawl it if needed.

    Links are:

    • any URL pointing to a page/post etc on the site
    • any assets that has been found (image, CSS, JS and such)

    Mainly due to images, JS & CSS files which can be many, the number of pages/files might look big.

    This way we’re making it more bulletproof that everything that needs to be on a page, it’s downloaded and follows the same structure as on the WordPress site.

    But as I said, your suggestion might be an interesting way to see if some pages are missed so we can include it.

    Thread Starter jessejones

    (@jessejones)

    @ibenic Thanks for response and explanation of how it works! ??

    Yeah in theory, everything in the sitemap should be the majority of pages and the pages in it should link out to the majority of the site. The only exception would be pages that a user does not want to index, since its not part of the Sitemap, but these pages they could add to the “additional files” section.

    For the files, does Simply Static keep some kind of running list if the file URL/asset (JS, CSS, image) has been found?

    For most sites, it seems like the website pattern would be:
    – Major CSS and JS files don’t change much between pages
    – Image usage can vary greatly between pages, e.g. blogs typically have different images in every post.
    – Occasionally there’s new JS/CSS if the page is somehow unique from the rest of the site, e.g. special layout, sale, new design, etc.

    In the case of images/media, it might be possible to access the media section of the site. The majority of images a website contains will probably live here with some linked out externally, e.g. CDN.

    • This reply was modified 1 year, 6 months ago by jessejones.
Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Idea: Use Sitemap files to determine pages to generate’ is closed to new replies.