• Summary: I had two ideas that seem like they might significantly improve the performance of WP Super Cache under heavy comment posting load, minimizing or eliminating the need for lockdown mode. Feedback would be appreciated.

    Details: One of our customers has a fairly busy WordPress / WP Super Cache site that has an extremely large number of comment posts. It’s not unusual to see posts garner 500 comments within three hours, for example (a new comment every 20 seconds), and posts with more than 1000 comments aren’t unheard of.

    The site uses a great many WordPress plugins. An uncached page load can take several seconds, even on an otherwise lightly loaded (90% idle CPU) 3 GHz dual Xeon server using FastCGI.

    With this kind of comment posting, even WP Super Cache can’t really keep up unless it’s in lockdown mode. The customer is unhappy with that because of the delay in comment posting.

    So I analyzed this. Consider what happens when a comment is posted if lockdown mode is not enabled: As a first step, the supercached file is removed. The blog then regenerates the supercached file the next time it’s viewed.

    However, if the page is getting several views per second, this causes a problem. The instant the supercached file is removed, the load on the server shoots up, because several concurrent requests for the uncached page arrive and start running. Within a couple of seconds, there might be half a dozen copies of the script running, all competing with each other for resources to display the page (and generate the cached file as a side effect). And there are more requests arriving every second, making things worse and worse. Sometimes, the resource contention can cause the first WordPress copy to take ten seconds or more to regenerate the cached page, and even after the cached page is first generated, the server is almost certain to be “pegged” for twenty seconds or more finishing up the requests that arrived.

    Then, 20 seconds later, the whole thing starts again with a new comment post. The end result is that more than half the time, there’s no cached page available, and the server will pretty much be 100% busy running WordPress scripts despite the caching, leading to errors as it runs out of CPU time.

    Lockdown is the traditional solution to this. But what’s interesting is that the performance problem is not caused by the *first* post-comment visitor requesting a new page and regenerating the cache. The server could handle regenerating the new cached page once every 20 seconds as a new comment arrives, no problem at all. Rather, it’s caused by the fact that there might be *dozens* of new uncached requests for that page before the supercached copy is regenerated.

    If WordPress could be made to run the code to regenerate the cached page only once per comment post, this shouldn’t be a problem (and almost nobody would need lockdown mode).

    I had two separate ideas about how to make this happen. First of all, it might be possible for WP Super Cache to realize that a page is in the process of being regenerated by another concurrent process, and simply wait a few seconds for the result of that page to appear in the cache directory, and send that back instead of running another full WordPress instance. This could be implemented as follows:

    When a request for a “to be cached” page starts, WP Super Cache would put a special temporary file with a funny name in the appropriate supercache directory. At the beginning of each request, WP Super Cache would check for the existence of this file. If the file is present and is less than, say, 15 seconds old, the code would go into a loop where it sleeps for, say, a tenth of a second and looks to see if the cached file is now in the directory. If so, use it. If we’ve slept for more than what would be 15 seconds since the temp file was created, assume that something is wrong and break out of the loop and run normally.

    A second idea is to not delete the “obsolete” supercached file when a new comment is posted. Instead, rename it to a special name — say, append “.obsolete” to it. The .htaccess rules won’t find the file under this special name, so the next request will start regenerating the supercached file. However, that request could check to see if there is a “.obsolete” file present in the directory, and whether that file is less than, say, 30 seconds old. If so, rename it back (remove the “.obsolete”) so that subsequent requests that arrive for the page before our process finishes will still use the original cached file via the .htaccess rule. Again, our process will be the only one regenerating the new file, running no more than once every 30 seconds.

    This second idea isn’t perfect, because it delays the appearance of the most recent comments (like lockdown mode) if two people do visit the page at once. However, the average delay is the same as the average time between requests for the page, and the maximum delay would be 30 seconds (or whatever), which is much less than lockdown mode.

    So those are my ideas. Any thoughts? I suspect I might be missing something obvious. If either of these sounds like it would work, I can come up with a patch and test it.

Viewing 15 replies - 1 through 15 (of 18 total)
  • Moderator Samuel Wood (Otto)

    (@otto42)

    www.remarpro.com Admin

    The .obsolete trick is probably the better approach. Because having all those concurrent processes basically waiting is going to rapidly kill the database as it runs out of connections.

    Thread Starter tigertech

    (@tigertech)

    While I was envisioning it do the “check and sleep” thing before it makes a database connection, I guess that would be hard to guarantee.

    More generally, I agree that the second idea does seem better, because it avoids intentionally delaying anything. Intentional delays would probably just cause something else horrible to happen (even if the database didn’t fill up, the apache process table could easily fill up while waiting).

    I like the second idea too, but locking is always a problem on high traffic sites or pages. If you’re willing to do a patch I’d love to see it as it would be useful!

    I’m 75% there, but I can’t settle on a form of rewrite rules that minimize the performance hit

    Here’s the sequence.
    phase2 wp_cache_ob_callback
    1. rename .obsolete to .wip (.wip is served by rewrite) so no
    other anon clients will load php
    2. create .tmp
    3. write file
    4. rename .tmp -> .html
    5. delete .wip

    If a file has been expired, there’s will always be a file for rewrite to serve and I think the .tmp -> .html sequence also addresses the possibility of partially written files in this thread.

    .obsolete files are generated in prune_super_cache and wp_cache_post_change by renaming instead of unlinking and aren’t served by rewrite rules

    The sequence and logic seems sound. I’ve got this working on my mutant version of supercache just fine, but the most important and trickiest bit is to get Step 1. moved out of ob_callback to before ob_start so that only the first anon client (unless we’re very unlucky) will load php

    Stay tuned – It might take a couple of days

    I was about to hit ‘Post’, but I got thinking… It may not be that bad. I can probably rename the .obsolete files on fairly cheap pre-conditions in phase1 and then undo it in ob_callback if it turns out to have been inappropriate to do it.

    Thread Starter tigertech

    (@tigertech)

    While thinking about this, I had a third idea.

    The ideal solution would be to make the full WordPress code run just once after a comment is posted, but to do so as soon as possible after the comment (so that the comments aren’t outdated for other visitors).

    Perhaps it could be engineered to forcibly make exactly that happen. Something like:

    Don’t delete (or touch) the supercached file at all when a new comment is posted. Instead, fire off a separate “fake” standard (non-cookie, non-comment) page load using cURL or something.

    That fake page load would use a magic cookie value that mod_rewrite notices to bypass the cache, but WP Super Cache would ignore it and treat the request as cacheable, thus generating a brand new copy of the supercache file.

    The overhead is one extra fake page request, but in return, there never needs to be a moment in which the supercached file is missing, so there’s no huge load spike for a few seconds after the comment is posted. Additionally, the delay before the new comment appears for other people is simply the length of a single standard page load — much better than lockdown mode.

    tigertech, I was thinking along similar lines but decided against it because of this:
    <!--nextpage-->
    Comments usually appear at the bottom of every page of a paged post. Taking the human trigger out of that process means that you may actually increase the server load by having to automatically regenerate all the pages of a paged post when everyone is busy posting comments on just the first or the last.

    I’ve got a working version already that pretty much guarantees only one load of the php engine for each supercache page by doing this:

    1. Don’t delete expired pages; rename them to .expired
    2. Page generation renames .expired to .wip (short for ‘write in progress’) which is served by rewrite rules to parallel requests during page generation.
    3. Write .tmp rather than .html so that partial .html isn’t served (the rewrite rules prefer .html to .wip)
    4. Rename .tmp -> .html
    5. Delete .wip

    Unfortunately, I’m having problems making a patch against supercache since the version I’m running is more ‘different’ than TortoiseMerge can cope with now. :\

    Thread Starter tigertech

    (@tigertech)

    Murmatron 2 wrote:

    Page generation renames .expired to .wip (short for ‘write in progress’) which is served by rewrite rules to parallel requests during page generation.

    Just to make sure I understand: why is an extra file extension necessary? Can’t you just rename it back to .html and keep the RewriteRules as-is? Then you can skip step 5, too.

    why is an extra file extension necessary?

    Because rename() in PHP fails on Windows servers if the destination file exists. To my mind that would have left a (very small) window of time where no file existed to serve since to be cross-platform compatible we would have to delete the .html before renaming the .tmp -> .html

    Working properly on Windows is important to me.

    That’s the only reason.

    OK, I’ve been busy hammering a subset of my dev-site with multiple wget sessions at a rate of about 10 requests/sec (that’s much more than the PHP engine can serve, and much less than Apache can serve statically – it seemed a to be a fair balance) with very short cache expiry times whilst browsing and commenting and removing posts etc.etc. and didn’t get any junk or lag so I’ll post a link to a patch for anyone else brave enough to try it.

    Thread Starter tigertech

    (@tigertech)

    Because rename() in PHP fails on Windows servers if the destination file exists. To my mind that would have left a (very small) window of time where no file existed to serve since to be cross-platform compatible we would have to delete the .html before renaming the .tmp -> .html

    Hmmm, okay. That’s true, but this seems like a lot of extra effort to go to (and a lot of complicated extra rewrite rules) just to avoid very occasionally running an extra copy if a new request arrives before the unlink/rename.

    I actually came up with a simpler patch, at https://www.tigertech.net/patches/wp-supercache-mini-lockdown.patch. I want to test it on a high-volume site that normally requires lockdown mode, though, for a while, before suggesting that this approach is a useful general solution. I’ll report back what happens.

    tigertech – sort of applied your patch to trunk and it’s working. You had forgotten to check if the user regenerating the supercache was an anon one. Try it out here: https://svn.wp-plugins.org/wp-super-cache/trunk

    Thread Starter tigertech

    (@tigertech)

    tigertech – sort of applied your patch to trunk and it’s working.

    Great! Yours is much cleaner (and correct-er); thanks.

    I tried it, and it works as intended, but I did notice a bug: all the supercache files in the cache directory are getting renamed to “.needs-rebuild”, instead of just the ones affected by an edit/comment.

    This is happening because it now uses prune_super_cache to delete/rename files that are associated with meta files (instead of just unlinking them). When it finds that the home page cached item needs to be pruned, it calls “prune_super_cache (‘/$cachepath/supercache/$hostname/’, true, true)” to prune that page — but prune_super_cache is recursive, so it actually affects every file below that directory (i.e., everything).

    One way to fix it is to add a flag that can make prune_super_cache non-recursive in that special case, like this:

    https://www.tigertech.net/patches/wp-supercache-20080925111227.patch

    With that small patch applied, it seems to work as I’d expect. I’m testing it on the busy site I mentioned, and will report back in a few days as to whether it eliminated their need for lockdown mode during busy comment posting periods.

    Thanks again for all the effort — without WP Super Cache, some of the sites we host simply wouldn’t work, even if we threw clusters of powerful machines at them.

    Tigertech – Instead of modifying that function, I modified the post_change function so it renames the 2 index.html files directly. The prune_super_cache() function doesn’t need the recursive parameter when called elsewhere so it doesn’t seem necessary to modify it.

    The needs-rebuild code is in the latest release but it’s disabled by default. Check out the sample config file for the right option.

    Thread Starter tigertech

    (@tigertech)

    I wanted to followup with the results of testing this.

    It works extremely well. Here are some graphs showing the CPU usage and load average of the busy WordPress site that started all this. (This is on a 3 GHz dual Xeon server with 4 GB of RAM, and the site receives around 100,000 – 150,000 page views a day.)

    Just so it’s clear, the site was always using WP Super Cache. The problem was that the site sometimes gets hundreds of comments per hour in response to a single post, and when that happened, the supercache file was not available a large portion of the time (because it was being rebuilt). Every comment posted led to dozens of simultaneous slow non-cached runs of the full WordPress script until one finished and the cached file was recreated.

    The only solution for these load spikes was for the site owner to enable “lockdown” mode when he noticed a comment storm happening. That fixed the load spike, but as far as he was concerned, broke the functionality of his site — new visitors couldn’t see any comments that had been posted, and the comments were the main reason many were visiting.

    The new code solves this. I first used my custom patched version, and when WordPress 0.8.2 came out, used that version unmodified and enabled the experimental feature in the config file. The CPU usage and load dropped remarkably, as you can see from the graphs. There have been no more 503 errors, and the site owner says that neither he nor his visitors have noticed the occasional few-second delay before comments appear in the cached version of the files. He reports no other problems at all, and he has not needed to enable lockdown since the change.

    While not everyone needs this code, people who do currently need lockdown mode would probably benefit tremendously from using this method instead.

    I hope this is useful — I’d be glad to answer any questions anyone has about my experience with it. Donncha, thanks for being so willing to listen and to spend time reworking the code to include this idea.

Viewing 15 replies - 1 through 15 (of 18 total)
  • The topic ‘WP Super Cache performance with heavy comments: ideas’ is closed to new replies.