• Resolved jazbek

    (@jazbek)


    I am a developer for two sites. On both of these servers I am having problems with the exact same symptoms:

    At various times throughout the week, the cpu load will suddenly start shooting up exponentially until apache is unable to serve pages. This clears up when apache is restarted. Neither server is having an issue with not enough RAM/too much swap being used.

    Site 1:
    dedicated server with 2GB ram
    ~410k http requests per day
    ~1700 visits/day
    not using multisite
    w3 total cache (using memcached)

    Site 2:
    Rackspace cloud server instance with 4GB of ram
    Database is on another instance also with 4GB of ram
    uses multisite (5 blogs via subdomain)
    ~560k http requests per day
    ~5000 visits per day
    WP Super Cache

    Both sites have wp 3.0.1 installed. One of them is a new site (built in 3.0.1). The other was running 2.7 for over a year just fine, and this problem didn’t start happening until immediately after I upgraded to 3.0.1 in September. Neither have any plugins in common. Any ideas where I should look to see where the problem might be coming from?

    I have been pulling my hair out over this for almost two months, and am hoping for some insight.

Viewing 15 replies - 1 through 15 (of 27 total)
  • Thread Starter jazbek

    (@jazbek)

    Can a moderator move this to the WP-Advanced forum? Thanks.

    mrmist

    (@mrmist)

    Go on then.

    maxk

    (@maxk)

    Have you checked your Apache error logs, MySQL error logs, and set up slow query logging?

    Sounds like a table locking / data processing issue.

    Thread Starter jazbek

    (@jazbek)

    Nothing in the apache error logs. I will check the MySQL logs.

    lochinvar

    (@lochinvar)

    We were running into the same issue for a number of months. Web logs pointed to request that were constantly being redirected and would eventual max out the redirect tries. At some point this would bring down our server, usually just after we published a set of posts. Well, four posts, but it was always at 9pm pst and we would see the server go down about 25 minutes after that.

    We turned on caching and the problem disappeared. I know that I should go back and figure out why the redirects were maxing out but now that the server has stopped crashing it is low priority.

    lochinvar

    (@lochinvar)

    Sorry I should have mentioned that we saw this issue with both 2.x and 3.x installed.

    And I should learn to read the original post fully before replying. Ignore the bit about caching

    Could be some sort of Apache or PHP memory leak? Try set the max number of requests for Apache child to a lower number so the processes are recycled faster.

    Thread Starter jazbek

    (@jazbek)

    @lochinvar – I too had the exact same problem on a 2.9 MU site this past January. When this started happening, I thought it was the same thing, but I’ve been scanning the apache logs like crazy and can’t find any redirect loops.

    @donncha – Yes, it does seem sort of like a memory leak, but aren’t memory leaks kind of slow? I have seen the server load go from .5 to 140 in less than 5 minutes.

    EDIT: good idea about recycling processes though. I will change that setting and cross my fingers!

    Thread Starter jazbek

    (@jazbek)

    I just checked and MaxRequestsPerChild is already @ 1000 on both sites — that already seems low enough, no? Considering the site is getting hundreds of thousands of http requests per day, that would mean it’s recycling hundreds of times a day already.

    maxk

    (@maxk)

    Again — it sounds like it could be a locking issue on the tables. Set up your slow query log to look for any abnormally lengthy MySQL queries, and check your crontabs to make sure there isn’t some maintenance process doing analytical crunching on the database, or interrupting the server.

    Thread Starter jazbek

    (@jazbek)

    I did set up slow query logging today, and since then, we’ve experienced two load spikes. The most recent was at around 01:53 (server time).

    Here is the slow query log from around that time:

    # Time: 101117  1:40:22
    # User@Host: x_] @ localhost []
    # Query_time: 2  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
    SELECT tt.term_id, tt.term_taxonomy_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy as tt ON tt.term_id = t.term_id WHERE t.term_id = 1532 AND tt.taxonomy = 'link_category';
    # Time: 101117  1:55:12
    # User@Host: x_] @ localhost []
    # Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 0
    SELECT t.*, tt.* FROM wp_terms AS t INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category')  AND t.slug = 'book-reviews' ORDER BY t.name ASC;
    # Time: 101117  1:58:11
    # User@Host: x_] @ localhost []
    # Query_time: 8  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
    SELECT tt.term_id, tt.term_taxonomy_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy as tt ON tt.term_id = t.term_id WHERE t.term_id = 1532 AND tt.taxonomy = 'link_category';
    # User@Host: x_] @ localhost []
    # Query_time: 8  Lock_time: 0  Rows_sent: 5  Rows_examined: 3646
    SELECT *, ((0.1800 * (MATCH (<code>title</code>) AGAINST ( "alternative medicine's flawed reasoning one_ true cause all_ disease " ))) + (2.4429 * (MATCH (<code>content</code>) AGAINST ( " medicine disease alternative treat science pain energy true causation infection claims genetic evidence bacteria practitioners strep simple life underlying treatment" )))  ) as score FROM <code>wp_similar_posts</code> LEFT JOIN <code>wp_posts</code> ON <code>pID</code> = <code>ID</code> WHERE (MATCH (<code>title</code>) AGAINST ( "alternative medicine's flawed reasoning one_ true cause all_ disease " ) OR MATCH (<code>content</code>) AGAINST ( " medicine disease alternative treat science pain energy true causation infection claims genetic evidence bacteria practitioners strep simple life underlying treatment" ))  AND post_status IN ('publish') AND post_type='post' AND ID != 13757 AND post_password ='' ORDER BY score DESC LIMIT 0, 5;

    The longest query here 8 seconds, and was completed 5 mins after the load spike. This is my first time using slow query log, but it doesn’t seem to me like this is the issue, plz correct me if I’m wrong.

    Edit: took out the server name to protect my client’s anonymity. ??

    maxk

    (@maxk)

    Well the next step is to set up a crontab to run uptime periodically and record the results of top when the load average spikes…

    ghas

    (@ghas)

    Luckily I have run across this issue yet but thanks for the info. In the beginning I did set up my slow query log to look for lengthy abnormalities.

    @jazbek, I’d recommend some different settings for W3TC, contact me for tips.

    I’m facing the same sort of problems
    Just migrated to VPS.net
    Might contact you as well

Viewing 15 replies - 1 through 15 (of 27 total)
  • The topic ‘Load spikes – any ideas?’ is closed to new replies.