• Hello,

    This is really not a ‘support question’, but rather a request for some information.

    As far as I can see, WordFence will write to the database each time an access is made (for the real-time activity display). Although there is a limit to the number of entries stored in the database, and on a very busy site it means that the table for the hits does not grow beyond that limit, the truth is that there will be a database write with every hit!

    Because the main server I use has plenty of power for the sites it runs, I wasn’t even aware of all those database writes. But then I started doing a MySQL replication on a backup server which works as a cold stand-by in case the main one fails. Its specs are much, much lower than the main server — the point is that it ought just to keep the sites barely alive while I work on the main one, if it goes down for some reason — but, for a while, I didn’t really notice anything being wrong. Then, one day, a couple of months ago, I suddenly realised that the replication was getting hopelessly out of sync! Researching on the logs to see what was wrong, I noticed gazillions of entries on the WordFence tables for recording hits, and these were completely swamping over the poor backup server — way too many disk writes for its underpowered hardware.

    So the solution seemed to be to turn off all real-time display activity on all websites I run. This slightly improved the performance of the backup server (while it also improved the main server’s performance, of course, but it was not overwhelmingly noticeable, since, as said, the main one has plenty of CPU cycles and I/O, as well as lots of free RAM), but I still got a lot of writes for the hits table.

    Now this might just be a coincidence, but it seems to me that WordFence, no matter if real-time display activity is turned on or off, will nevertheless store all hits in the database — which actually makes some sort of sense, since it’s supposed to be processed for statistics at some point (or sent to WordFence’s central computers to investigate for new kinds of attacks). It seems logical that this information has to be stored somewhere and not simply discarded!

    However, storing it on the database is a big no-no! The goal of any WordPress admin (well, any CMS admin, to be honest) is to have next to zero database accesses on any website, and let everything be statically served to clients, all from the disk (or, better, from memory). WordPress with nginx as the server can even sidestep PHP in several occasions, depending on how it is configured. That’s how you can achieve huge performances — only when the site really changes (i.e. when someone adds/updates an article or page) will there be a real need for the database to be accessed, and a new static page can be generated that way.

    WordFence, somehow, subverts this concept, by ‘forcing’ all accesses to go through it. Note that this is just my assumption; the code is complex enough for me to be sure about what I’m saying! I can just tell that it appears to work that way.

    Now, I understand that there is a dilemma here. On one hand, we all want to minimise database accesses, especially writes — these should be reduced to ‘next to zero’ (the only reason for not being zero is when a new article/page is posted/updated). On the other hand, we need WordFence to check via its firewall if all accesses are genuine, and somehow log them for processing them further at some point, elaborating statistics, or even sending them to a central storage where potential attacks can be identified and quickly stopped. This means that this information must be stored somewhere.

    Again, not being familiar with the code, I have no idea if all the above is really true or not — just because it seems to be true (and it’s even logical that it works somehow in that fashion), it doesn’t mean that WordFence really works like that. But if it does, I would really wish for the ability to shut down all database writes — losing some statistics and such, but greatly improving overall performance, especially at the database level!

    An alternative, of course, is writing the logs not to the database, but to an external file, which is kept open for this purpose. Because the writing can be buffered (even at the OS level), it might not be so tough in terms of performance. Of course, it means extra work when actually producing statistics, but the trade-off would be fine: statistics are not generated every fraction of a second (but perhaps just once per day… or per hour), while database writes from incoming site visitors can definitely pop up several times per second.

    Another alternative would be to get access to the server logs. This, naturally enough, is only available to those WP admins who have total control over their hosting setup; the logs would have to be written on a directory that WordFence could ‘see’. In many cases, however, this is rather the norm than the exception (again, I’m talking only about personal experience; I have worked with just a handful of hosting providers, and they might all have been exceptional in giving public access to the server’s logs for our websites). The WordFence programmers would only need to adapt one of the many PHP libraries that extract information from an Apache/nginx access log, and produce all statistics from there instead. Note that it’s unlikely that a sysadmin will stop the access logs for their web server; so it means this kind of work would be done by the CPUs anyway, meaning that it would be pointless to duplicate the effort by writing one’s own logs. In some cases, however, the hosting provider will never give access to those logs, and that would mean that WordFence would have no option but to write on a disk file or add rows to the database.

    Finally, if there are no other options, would it make sense to use the ARCHIVE engine for the table of hits? It is supposed to have much better performance for pure logging purposes. It would still require replication with lots of write operations, though…

    Thank you for any input on this!

Viewing 3 replies - 1 through 3 (of 3 total)
  • Good comments @gwynethllewelyn
    I found 15,000 “hits” on my database with WordFence turned on in the wf-hits table mixed in with some other links. I turned off the live reporting in WF and it stopped of course. Would be nice if it could write to an xml file under some archive folder by the week or month.

    Plugin Author Wordfence Security

    (@mmaunder)

    Hi Gwyneth. Thanks for taking the time share your thoughts with us. I want you to know that a few folks in Wordfence have read your post, including one of our senior devs.

    It’s quite a long post and there is a lot there, but I’d like to share a few comments. This might not go into as much detail as you’d like but hopefully it’ll give you a sense of how we think about this problem.

    I understand that you are focused on DB performance and I also understand the specific reason: mysql replication. Believe me I’ve had enough experience watching slave replication fall behind to know how frustrating this might be. [Random war story: Back at etoys.com in 1999/2000 mysql replication didn’t exist so we created our own using the mysql query log. It was messy but worked surprisingly well and was how we had hot standby’s for a multi-billion dollar business!]

    We worked quite closely with some of the largest hosting providers in the world. Most Wordfence and WordPress installations are on those environments. Their configurations and load concerns vary depending on how they’ve configured disk and the database. For example we had feedback from one major hosting provider that their disk was generally very slow and they really wanted to work hard to minimize disk reads and writes. So we changed the way we do some of our malware scanning to specifically accommodate them and other hosts with the same configuration.

    So when we look at Wordfence performance, we have to minimize load on both the database and the disk – we can’t give one preference over the other.

    Regarding turning off live traffic: This should stop logging of each request to the DB. Once live traffic is disabled you should only see attacks and things like logins logged to the database and in most cases that is a fraction of your total traffic.

    We can’t use the server logs because they are in a different format to what we require.

    We can’t use a specific DB engine because many platforms only have one specific mysql engine installed.

    We do need to log some things to the database. We can’t be completely silent as Wordfence operates.

    In general we’ve worked to make Wordfence provide an excellent level of security that minimizes both disk and DB IO as much as possible. This is of course an ongoing effort and as I mentioned some of our senior folks have read your post and we very much appreciate your feedback.

    Kind regards,

    Mark Maunder – Wordfence founder/ceo.

    Thread Starter Gwyneth Llewelyn

    (@gwynethllewelyn)

    Mark,

    Thank you so much for your very thorough explanation. I understand now better why a disk-only logging option is not possible; it’s all about trade-offs. I will have to give this some more thought — aye, of course something must get logged once in a while, but it is certainly frustrating from my perspective to see that the only reason for some sluggishness on the ‘main’ server is… WordFence (and the built-in firewall) happily writing entries to the database! (I hardly catch any other activity on the MySQL server these days) I’m considering moving most of the WF tables to the MEMORY engine, but of course these would continue to get replicated…

    Theoretically, I believe that I could prevent those tables to be replicated at all, thus solving the problem. As said, while the ‘main’ server is not exactly a top-of-the-line brand new shining powerhouse, it’s got respectable performance, and I’ve yet to see it reaching 100% load on all 4 CPUs (except when it boots!), even with all the websites running WordFence and happily writing to the database all the time. I could also theoretically forfeit MySQL replication and just use unison to do the same at the file level — after all, this is not a hot standby server, it will always require manual intervention of some sort.

    Maybe the easiest way is to somehow buffer the requests and have a setting somewhere to do that — say, writing to the database every five minutes or so, as opposed to real time. I haven’t checked your code, as said, and very likely you’re using prepared statements, which in turn ought to be optimized by MySQL anyway, but, alas, the problem will not go away with the replication server.

    I’m curious about the difference between the format you require for your own logs, as opposed to those that webservers usually write. Both Apache and nginx are rather flexible at the kind of log format they can write. Obviously, not everybody has the ability to change the log format — and it would also mean that most web statistics tools that work at the Apache/nginx level would break — but I still wonder what makes reading the server logs impossible. Is it because you can get data from WordPress itself which is not present at the web server level? (As said, I’m just curious!)

    In any case, I’m very thankful to your senior folks taking their precious time to have a look at this issue. Even if there will never be a ‘solution’ to this particular issue, it gives me a lot of confidence in your plugin because you’re always on top of everything! Thanks so much for doing this.

    (P.S. I love your security updates and explanations on your website; I hope you can keep doing those!)

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘WordFence creates a lot of database writes?’ is closed to new replies.