• Resolved epcur

    (@epcur)


    We’re very happy with W3 Total Cache so far and use it for Object Caching, Browser Caching and HTML/JS/CSS minification. We’d also like to use the plugin for Page Caching but have some questions regarding our particular use case.

    We’d like to use the enhanced page caching method which stores cached files to disk, i.e., in wp-content/cache/page_enhanced/.

    For each page, W3TC creates a directory, a _index_slash_ssl.html file and a _index_slash_ssl.html_gzip file.

    We have a large website and a limited inode count on the Linux server we’re using. We’d like to limit the inode count and disk space requirements to page-cache as many pages as possible (we cannot page-cache all of them).

    1) Is it possible to only create .html_gzip files and no .html files? Whether via an existing or future plugin setting, or by editing the plugin .php or .htaccess files directly (if so, where)?

    Now, as you know, the structure of the page cache is as follows:

    page-type-dir/
    —-slug-dir/
    ——–_index_slash_ssl.html
    ——–_index_slash_ssl.html_gzip

    Which requires 3 inodes per page. Is there a particular reason for this structure (with the same filenames used for all pages)? For a large site like ours, a structure like this would be ideal:

    page-type-dir/
    —-slug.html
    —-slug.html_gzip

    Or even better, since most browsers support gzip compression:

    page-type-dir/
    —-slug.html_gzip

    2) Any change you might (provide an option to) change this structure? Or could we modify the plugin .php ourselves easily to this end?

    3) The page cache cannot be configured so that the cached pages are deleted (and not recreated unless there’s a new page request) after N minutes/hours/days, is that correct? We wrote a custom script that runs continuously on our server to enforce a cache lifetime of this sort (by deleting pages that are older than some specified time period), but are wondering whether this is possible with the plugin settings.

    4) The goal here in general: Ideally, we would be able to only page-cache up to N of the most recently requested pages at any given point in time since we simply cannot cache all the pages we would like to cache currently. So being able to set a specific page/inode limit (and/or disk space limit) would be great. Setting a maximum cache lifetime for a page (ideally, starting from the last time it was requested, not the time when the cache files were created) would also be great.

    …so we’re wondering what your thoughts are on how to use W3TC in our particular use case. Thanks for reading!

    EDIT: After posting this and doing some testing, we noticed that the suffix _old is appended to .html and .html_gzip files. This happens after the the time set for the Garbage Collection Interval (“If caching to disk, specify how frequently expired cache data is removed. For busy sites, a lower value is best.”)? Are these caches files/directories removed completely or recreated?

    • This topic was modified 1 year, 11 months ago by epcur.
Viewing 4 replies - 1 through 4 (of 4 total)
  • Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @epcur

    Thank you for reaching out and I am happy to assist you with this.
    In short, there is no way to achieve this using Disk enhanced caching method.
    When the page is cached, you end up with 2 versions for actual (gzip, plain). In your post, you understand why two files are created, and the reason for this also is that someone may not be using the W3TC gzip, or gzip at all or any other form of compression. This is why the plain version must exist, and gzip cached version is when gzip is active.
    _old files are created once the cache is purged. This ensures that the new version of the page is presented to the visitors, and the _old files are being removed from the disk, in the Garbage Collection interval which is by default 3600s.
    Please note that when using Disk: Enhanced filename is the URL. and this is why you are seeing the structure you mentioned in your post.

    In W3 Total Cache, the page is only cached when visited, unless the Preload is active which is caching the entire website based on the sitemap URL previded.

    If the page is not visited, it’s not cached. The Pages remain cached and served as cached pages unless the cache is purged. W3TC does not purge the cache automatically unless the settings in the Page Cache>Purge policy are set to purge the cache once the content is updated/posted.
    then and only then, the cache is purged by W3 Total Cache, _old files are created and the pages are cached again only after the first next visit. This being said, only visited pages are cached, and not the entire website.

    For your use case, maybe it’s best to use DIsk Basic caching method, or better yet memory based caching like Redis or Memcached (if the server allows this of course), as there are no possible easy modifications that can be done in the plugin to achieve this.

    Thanks!


    Thread Starter epcur

    (@epcur)

    Thank you for the detailed explanation, much appreciated. With a larger SSD, we can now page-cache many more pages and use W3TC the way it is intended to be used with one exception: Could you please provide an option to disable the generation of .html files altogether? For our very large page cache, the gzipped files alone are sufficient and having the .html files present as well dramatically increases our disk space requirements. Also, gzip is supported by all commonly used desktop and mobile browsers today? If there are some users/bots which cannot be served gzip files, it’s perfectly fine if they are not provided with a page-cached version of a given URL as far as we are concerned. For those presumably rare cases, the object cache and the additional optimizations we enabled/implemented are sufficient.

    If you cannot provide an option to disable .html altogether, is there a somewhat straightforward way to disable it by editing the plugin .php files directly (which, yes, we would have to do for each updated version of W3TC again)? We haven’t looked at the plugin .php files in detail yet, but, currently, we’re using a Python script which runs 24/7 on our server to delete all the .html files generated by W3TC which is a bit of an awkward solution but is working fine for now.

    Also, is there any particular reason you are using Gzip compression rather than Brotli or providing the option to use either Gzip, Brotli or both? Ideally, it would be nice if uncompressed HTML, Gzip and/or Brotli could all be enabled or disabled by the user.

    Plugin Contributor Marko Vasiljevic

    (@vmarko)

    Hello @epcur

    There is no option for that you are asking and no easy method to edit the files for this use case.
    The best solution for you is to use some memory-based caching method like redis or memcached

    Gzip or brotli depends on the technologies you have on yoru server. You can enable/disable gzip in the Performance>Browser Cache, and in most servers mod_deflate is installed/enabled by default (apache). However, in order to use Brotli, also in Browser Cache settings, you need to install ?PHP brotli extension because W3 Total cache checks if the php extension is installed.

    Thanks!

    Thread Starter epcur

    (@epcur)

    Thanks again for answering our questions. Much appreciated.

    Apparently, we missed the Brotli option or did not check the Browser Cache section carefully enough. Thanks! We do have the brotli extension installed on our server and, with W3TC supporting it, would only use .html_br, if possible (no plain .html and no .html_gzip). It is perfectly fine for our purposes if the vast majority of users/bots are presented with brotli-compressed data and those who do not support it are served no page-cached files at all.

    You mentioned there is no easy way to disable the plain .html files. We need to ask you about this again in detail.

    1) In your implementation, are the plain .html files always being created first and only then compressed to gzip and/or brotli?

    2) If yes, is there a comparatively straightforward way to edit the plugin .php files in order to delete the plain .html files once the .html_br files have been created?

    3) If not, is there really no fairly clear way to disable the creation of plain .html files be editing the php source?

    We’re developers, but not specialized in web development, familiar with php but not with rewrite rules/directives in particular. We searched the plugin .php files for occurrences of “.html”, “.html_gz”, “.html_br”, “_index.html”, “_index_ssl.html” and the like, but didn’t quite know what to edit. It’s perfectly fine we would have to do the same or similar editing with every plugin update. So if disabling the creation of plain .html files is not too involved, we’d like to do it.

    Our page cache .html files are about 120 kB per file, the .html_br files only 18 kB at the current compression level, and only 14 kB for brotli compression level 11. Since we need to page-cache a lot of pages, the size of the plain .html files really is a problem here. But apart from this problem, W3TC does everything we’d like it to do, so we’d love to stick with it.

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Page Caching: html_gzip only? limit cache size, lifetime, inode count?’ is closed to new replies.