• Resolved jkburges

    (@jkburges)


    Our wordpress installation uses the EWWW Image Optimizer plugin to generate and configure different versions of uploaded images. It uses srcset attribute to allow the client to choose which version to display based on screen width. The different versions end up as sibling files to the original in the uploads directory.

    I am using Staatic Premium v1.4.2.

    I have Staatic Settings > Build configured with Additional Paths /var/www/html/wp-content/uploads/ (the default)

    However, the images aren’t making it to the S3 deploy location. I see a bunch of info logs such as:

    Deletion of stale file wp-content/uploads/2022/11/PerformanceUpdate-Ls-1536×864.jpeg was successful

    I’m not sure what the expected behaviour is; is everything under Additional Paths supposed to be synced, or only if the crawler reaches it?

    I have been manually syncing the uploads dir to S3, but just wondering if there’s a way to disable the deletion?

Viewing 13 replies - 1 through 13 (of 13 total)
  • Plugin Author Team Staatic

    (@staatic)

    Hello @jkburges,

    We’re sorry to hear that you are running into an issue causing some assets to be missing from the static version of your site.

    Under normal circumstances, Staatic should detect the different versions automatically. The crawler does recognize the srcset attribute (including alternative srcset attributes like “data-srcset” and “data-wpfc-original-srcset” used by some optimization/caching plugins).

    In cases where the assets are not automatically detected, having the uploads directory set as an additional path in Build Settings (with at least the “Save” checkbox checked) should normally also do the trick.

    When deploying your site to S3, the generated static site is fully synchronized with the configured S3 target. This includes deleting files from S3 which are not (or no longer) part of the generated static site. Since for some reason the alternative version assets appear to be missing from the build, these are marked as “stale files” and deleted from S3 during the synchronization process.

    In order to find out what is causing these assets to be missing from the build we need to have a look at the publication details.

    1. Can you check whether the missing assets are listed on the Resources overview by going to the publication details and then clicking the Resources tab?
    2. When publishing with “Extensive Logging” enabled from Staatic > Settings > Advanced > Logging, can you see a reference to any of the missing assets in the publication logs, by going to the publication details and then clicking the Logs tab?

    Looking forward to your feedback.

    Thread Starter jkburges

    (@jkburges)

    Hi, thanks for those things to check. I think I have figured out what’s going on – there are a couple of things.

    The EWWW image optimiser generates quite a few versions of each image, but not all of them are actually used in the resulting pages. That explains why those ones get deleted (the unused ones), and that is fine.

    Some others though that are being deleted (after I manually S3 sync) – this is because certain resources are only visible on the mobile version of the site. I guess this is why the crawler is treating them as stale – because it never see them. So, to answer your questions, for 1) above, I don’t see any resources for these, but in 2) I see logs such as Deletion of stale file wp-content/uploads/2022/10/Menu-C.svg was successful

    Any idea of a workaround for this?

    Thread Starter jkburges

    (@jkburges)

    Any idea of a workaround for this?

    I guess I can do it with “Additional URLs” config but I wonder if there’s a more automatic way.

    Thread Starter jkburges

    (@jkburges)

    I think another class of things not getting crawled/synced are video poster=... objects

    Plugin Author Team Staatic

    (@staatic)

    The EWWW image optimiser generates quite a few versions of each image, but not all of them are actually used in the resulting pages. That explains why those ones get deleted (the unused ones), and that is fine.

    This does explain why the crawler did not detect these versions from the HTML code. However, it still does not explain why the additional versions did not get included due to the “Additional Path” set for the uploads directory.

    When the publication starts, one of the first tasks is “Initializing crawler”, which includes enqueuing the files detected from the configured “Additional Paths”. The total number of initial URLs is logged as: “Finished initializing crawler (**N** enqueued)”. What is the number in your case?

    Also, in what kind of environment are you running WordPress? Could it be the case that the uploads directory is somehow symlinked or a mounted volume in the case of Docker?

    I think another class of things not getting crawled/synced are video poster=... objects

    The poster= attribute should be handled by the crawler. We just tested this successfully on our test environment. You should be able to verify this as well by adding a test page with a video tag and see if the poster= URL is included in the build.

    Thread Starter jkburges

    (@jkburges)

    When the publication starts, one of the first tasks is “Initializing crawler”, which includes enqueuing the files detected from the configured “Additional Paths”. The total number of initial URLs is logged as: “Finished initializing crawler (**N** enqueued)”. What is the number in your case?

    Here are my logs are this (a bit janky since I cut and pasted):

    Initializing crawler info 2023/04/12 at 4:03 am
    Initializing crawler notice 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\EntryCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\EntryCrawlUrlProvider: 1 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\PageNotFoundCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\PageNotFoundCrawlUrlProvider: 2 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalUrlCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalUrlCrawlUrlProvider: 5 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 6 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 7 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 8 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 431 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 686 enqueued debug 2023/04/12 at 4:03 am
    Enqueueing from Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider provider debug 2023/04/12 at 4:03 am
    Staatic\Crawler\CrawlUrlProvider\AdditionalPathCrawlUrlProvider: 686 enqueued debug 2023/04/12 at 4:03 am
    Finished initializing crawler (686 enqueued) notice 2023/04/12 at 4:03 am

    Also, in what kind of environment are you running WordPress? Could it be the case that the uploads directory is somehow symlinked or a mounted volume in the case of Docker?

    Yes, wordpress is running in a docker container with a mounted volume from the host. The uploads directory on the host is a AWS EFS volume. Do you think this could affect the crawler? From inside the container, it looks just like any other directory AFAICT.

    You should be able to verify this as well by adding a test page with a video tag and see if the poster= URL is included in the build.

    Ok, I will give this a try.

    Thanks very much for your support ????

    • This reply was modified 1 year, 10 months ago by jkburges.
    Plugin Author Team Staatic

    (@staatic)

    Thanks for providing us with the logs. Looking at these, it appears that you have configured six additional paths. This results in the following number of enqueued URLs each:

    1. 1 URL
    2. 1 URL
    3. 1 URL
    4. 423 URLs
    5. 255 URLs
    6. 0 URLs

    At which position is the /wp-content/uploads path? And how is the path defined exactly?

    Yes, wordpress is running in a docker container with a mounted volume from the host. The uploads directory on the host is a AWS EFS volume. Do you think this could affect the crawler? From inside the container, it looks just like any other directory AFAICT.

    We’re not completely sure if this is relevant, but we’ll need to do some further testing to find out.

    Ok, I will give this a try.

    Perfect, thanks.

    Thread Starter jkburges

    (@jkburges)

    I only have the one “Additional Path”, the default /var/www/html/wp-content/uploads; follow and save are both checked.

    FWIW, I have three “Additional URLs”: /wp-sitemap.xml, /robots.txt and /favicon.ico – I think they are all defaults and save/follow all checked again.

    Where my config does differ from the defaults is with “Additional Redirects”, I have about ~200 entries here (all Yoast SEO redirects; I haven’t figured out another way to get the crawler to find the Yoast redirects yet). Having said that, I’m not sure if this is causing any problem.

    Plugin Author Team Staatic

    (@staatic)

    Got it. In that case the additional paths are automatically configured by the plugin for compatibility with other active plugins (e.g. Contact Form 7 and Elementor).

    One final question: how many files are in your uploads directory? From within your docker container, this can be determined by running find /var/www/html/wp-content/uploads -type f | wc -l.

    We will do some further testing and get back to you as soon as possible. Thanks for your patience.

    Thread Starter jkburges

    (@jkburges)

    One final question: how many files are in your uploads directory??

    ~56000

    Plugin Author Team Staatic

    (@staatic)

    @jkburges, after further testing, we were unfortunately not able to reproduce the issue on various testing environments. Would it be possible to provide us with temporary access to your WordPress instance (or preferably a simple testing/staging environment that shows the same issue)? If so, can you provide us with access instructions sent to [email protected]? This would be extremely helpful in resolving the issue in a timely manner.

    Thread Starter jkburges

    (@jkburges)

    Hello,

    I tried removing the uploads directory from “Additional Paths” (so now there are none configured), and lo and behold, things now seem to work as desired. No timeouts, no missing posters or other assets in S3. Perhaps it’s something to do with the number of files in there?

    FWIW, I am not against giving some level of access to a test env, but it would take a little work on my side to remove any private things and check what level of access the env has to other resources in our AWS account. So given that this problem seems solved for the moment, I’ll hold off.

    Thanks a lot for your help anyway, it got me on the right track.

    • This reply was modified 1 year, 10 months ago by jkburges.
    Plugin Author Team Staatic

    (@staatic)

    Great to hear that removing the uploads directory from “Additional Paths” has resolved the issue for you.

    Perhaps it’s something to do with the number of files in there?

    I think this may very well be relevant, and we will do further testing with larger uploads directories to see if we can reproduce the issue this way, and resolve it in a future update of the plugin.

    So given that this problem seems solved for the moment, I’ll hold off.

    We fully understand your position. Thanks for your patience and if you have any other issues, feel free to open another topic on the forums or get in touch by email.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Disable deletion of stale content’ is closed to new replies.