• Resolved Zaphod

    (@zaphod)


    Just looking for some advice here. Working on (practically complete) a car comparison site built using WP.

    I’ve written my own search code such that all data stored in meta_data is easily searchable. The thing is, my methods, while working 100% are not at all efficient I think and I’m looking for some pointers / suggestions on how to improve.

    Currently there are over 30,000 records (cars) on system. My approach is to :

    1. Run the full query with all parameters to get a full result count. Use this number to setup paging of results etc.
    2. Run the same query again but with limit x,10 where x is the page no multiplied by ten

    That’s two fairly hefty SQL queries running to basically do one thing.

    Here’s a sample of the SQL for an indication. As I’m querying meta_data there’s a load of joins which obviously doesn’t help.

    SELECT wposts.* FROM cc_posts wposts
    LEFT JOIN cc_postmeta cc_make ON (wposts.ID = cc_make.post_id)
    
    LEFT JOIN cc_postmeta cc_model ON (wposts.ID = cc_model.post_id)
    
    LEFT JOIN cc_postmeta cc_year ON (wposts.ID = cc_year.post_id)
    
    LEFT JOIN cc_postmeta cc_price ON (wposts.ID = cc_price.post_id)
    
    WHERE 1=1 AND (cc_make.meta_key = 'make' AND cc_make.meta_value = 'Ford')
    AND (cc_model.meta_key = 'model' AND cc_model.meta_value ='Focus')
    AND (cc_year.meta_key = 'year' AND cc_year.meta_value >= 2004)
    AND (cc_year.meta_key = 'year' AND cc_year.meta_value <= 2008)
    AND (cc_price.meta_key = 'price' AND cc_price.meta_value >= 4000)
    AND (cc_price.meta_key = 'price' AND cc_price.meta_value <= 8000)
    AND wposts.post_status = 'publish' AND wposts.post_type = 'post' AND wposts.post_date < NOW()
    order by (cc_price.meta_value+0) asc

    The second query is exactly the same but with the limit of e.g.

    limit 130,10

    (if page is 14: page 1 gets 0 -10, page 2 11 – 20 etc).

Viewing 15 replies - 1 through 15 (of 23 total)
  • Thread Starter Zaphod

    (@zaphod)

    [Sorry meant to add]

    Any help / suggestions gratefully recieved!

    Why are you selecting all the data just to count?

    SELECT wposts.* FROM cc_posts wposts

    Query just one column, and count it..

    SELECT COUNT(wposts.ID) post_count FROM cc_posts wposts

    If you only want the count from the result, you don’t need to select all the data just to do the joins… you can still query across tables and define your where clauses without selecting “that” data..

    Try that on your first query and see if the speed increases..

    vtxyzzy

    (@vtxyzzy)

    You might also see some marginal improvement by simplifying the query a little:

    AND (cc_year.meta_key = 'year' AND cc_year.meta_value BETWEEN 2004 AND 2008)
    AND (cc_price.meta_key = 'price' AND cc_price.meta_value BETWEEN 4000 AND 8000)
    Thread Starter Zaphod

    (@zaphod)

    t31os_,

    I’m selecting all the data because I need to know exactly how many results match (for example) Nissan [make] Almeras [model] between [year] 2004 and 2006 and cost between [price] 6000eu and 8000eu etc. All of that data is in the cc_postmeta table – there isn’t really provision to put it into the post table itself, hence the joins.

    I only join tables that I need to – but I do need to join price ANYWAY because the default view sorts by price.

    Thread Starter Zaphod

    (@zaphod)

    vtxyzzy,

    I changed the query as you suggested. My own version (not using BETWEEN) takes on average 1.93 seconds. Your version takes on average 1.91 seconds.

    It’s a start – so thanks for that. I’m wondering if I’m going to run into serious problems once the car numbers start heading towards 100k….

    Should I look at getting a dedicated server? Currently on shared hosting.
    Just wondering if being on it’s own box with nothing else would make these pretty complicated queries over vast amounts of data zip along?

    Thread Starter Zaphod

    (@zaphod)

    t310s_,

    I’ve re-read your post. I think I see what you mean now – instead of getting all the data, just get one field. I had thought (for some daft reason) you meant remove the joins too (which would obviously break the whole query).

    Thanks so much. Trying this now.

    Thread Starter Zaphod

    (@zaphod)

    OK,

    Result (I think).

    Here’s the query (or a sample of) that I’ve been running to count records that match the search criteria.

    SELECT wposts .  *
    FROM cc_posts wposts
    LEFT JOIN cc_postmeta cc_make ON ( wposts.ID = cc_make.post_id )
    LEFT JOIN cc_postmeta cc_model ON ( wposts.ID = cc_model.post_id )
    LEFT JOIN cc_postmeta cc_year ON ( wposts.ID = cc_year.post_id )
    LEFT JOIN cc_postmeta cc_price ON ( wposts.ID = cc_price.post_id )
    WHERE 1 =1
    AND (cc_make.meta_key = 'make' AND cc_make.meta_value = 'Ford')
    AND (cc_model.meta_key = 'model' AND cc_model.meta_value = 'Focus')
    AND (cc_year.meta_key = 'year' AND cc_year.meta_value BETWEEN 2004 AND 2008 )
    AND (cc_price.meta_key = 'price' AND cc_price.meta_value BETWEEN 4000 AND 8000 )
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wposts.post_date < NOW( )
    ORDER BY (
    cc_price.meta_value +0
    ) ASC

    Following t31os_’s suggestion I changed the select to

    select count(wposts.id)

    and left the rest of the query the same.

    First query was taking just under 2 seconds as reported by phpMyAdmin.

    Second query ran much quicker – phpMyAdmin didn’t show me the (for example)

    Showing rows 0 - 29 (300 total, Query took 1.9258 sec)

    just shows

    Your SQL query has been executed successfully

    So I think that’s a good (to possibly great) improvement. Am just applying to the code on site to see. Will report back.

    Thanks a million yet again guys!

    Thread Starter Zaphod

    (@zaphod)

    Actually made a small difference. Page returning in about 8 seconds rather than 9.

    HOWEVER you may have sparked a eureka moment. Why am I sorting the search query where I’m just getting a count? This is surely where the performance impact is being created…… I’ll remove that and we’ll see.

    Thread Starter Zaphod

    (@zaphod)

    Ok,

    This
    select wposts.* FROM cc_posts wposts LEFT JOIN cc_postmeta cc_year ON (wposts.ID = cc_year.post_id) LEFT JOIN cc_postmeta cc_price ON (wposts.ID = cc_price.post_id) WHERE 1=1 AND (cc_year.meta_key = 'year' and cc_year.meta_value > 1949) AND (cc_price.meta_key = 'price' and cc_price.meta_value > 0) AND wposts.post_status = 'publish' AND wposts.post_type = 'post' AND wposts.post_date < NOW() order by cc_price.meta_value

    takes 3.2 seconds

    This

    SELECT wposts . *
    FROM cc_posts wposts
    LEFT JOIN cc_postmeta cc_year ON ( wposts.ID = cc_year.post_id )
    LEFT JOIN cc_postmeta cc_price ON ( wposts.ID = cc_price.post_id )
    WHERE 1 =1
    AND (
    cc_year.meta_key = 'year'
    AND cc_year.meta_value >1949
    )
    AND (
    cc_price.meta_key = 'price'
    AND cc_price.meta_value >0
    )
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wposts.post_date < NOW( )
    LIMIT 0 , 30'

    took 0.0044 seconds

    and this

    SELECT count(wposts.id)
    FROM cc_posts wposts
    LEFT JOIN cc_postmeta cc_year ON ( wposts.ID = cc_year.post_id )
    LEFT JOIN cc_postmeta cc_price ON ( wposts.ID = cc_price.post_id )
    WHERE 1 =1
    AND (
    cc_year.meta_key = 'year'
    AND cc_year.meta_value >1949
    )
    AND (
    cc_price.meta_key = 'price'
    AND cc_price.meta_value >0
    )
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wposts.post_date < NOW( )
    LIMIT 0 , 30

    ..doesn’t even show me a time.

    Thanks folks!!!

    Apologies for posting the queries here again but I think this might be helpful to others in the same boat at some point.

    The count query doesnt need a limit, that’s just your lookup query to see how many results you have..

    Query one – count results so you know the total
    Query two – similar query, no count, get all data and apply limit/offset as appropriate

    So, do query one, (assume for a moment), returns a count of 30..

    If you wanted to show 10 results per page, that first query has allowed you determine how many pages you’ll need ..

    Total / Per page(assuming 10) = 3 (3 pages @ 10 posts per page = 30 total)

    The first is simply a lookahead query to find out how many results you have as a total, so you can then use that figure to work out how many pages you’ll need.

    So offset should be ( “Posts per page” x “page -1” ).. first page shouldn’t have an offset, only each thereafter, so you take the current page (1 if it’s the first), and subtract the value by 1(and multiply it by per page figure), so page 1, offset = 0(0x10), page 2, offset = 10(1×10), and so on… (10 is an assumed value following my example).

    Does that make sense?

    Thread Starter Zaphod

    (@zaphod)

    Sorry – the query I use to count doesn’t have the limit. Maybe I posted the wrong query up there.

    It’s just basically

    SELECT count(wposts.id)
    from tables, joins etc WHERE x = y as needed

    Second one is

    SELECT wposts . *
    FROM cc_posts wposts
    LEFT JOIN

    etc etc

    ORDER BY (cc_price.meta_value +0) ASC
    limit 0,10

    Ok, but what’s your question? Or is there not a question? ??

    NOTE:
    EXPLAIN can be placed infront of SELECT queries for information, incase you were not aware.

    Quote from MYSQL docs:

    When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order. EXPLAIN EXTENDED can be used to provide additional information.

    So, same query, just explain preceding the query..
    EXPLAIN SELECT a FROM b etc...

    Thread Starter Zaphod

    (@zaphod)

    No question ?? Just relating what I was doing.

    Actually, cleaning up the SQL statements is making a difference, but not to the point I was hoping for. Still, it’s a bit faster.

    Perhaps dedicated server route would be best to take a page that’s taking 10 plus seconds to load due to SQL / Data processing down to something closer to 1 or 2 seconds….

    Any thoughts?

    I’m sure it would help, but i don’t deal with these kinds of setups myself to be able to comment accurately about whether it would be worth the move.

    Certainly if it’s a high traffic site and you’ve got alot of data to work with, then yeah sure a dedicated / higher spec’ed server is going to help, how much it would help i don’t know..

    Are you sure it’s these queries in particular causing a problem? Do load times increase immediately upon their removal? How many other queries are running on the page?

    NOTE: It might be worth jumping in the WordPress IRC channel to see if anyone there can advise, there’s usually a few devs and sys administrators that hang out there. Not always, but usually if someone can answer your question and they see it, they will.. (but don’t expect immediate responses, people aren’t always at their PC when a question is asked).

    Just a suggestion.. ??

    vtxyzzy

    (@vtxyzzy)

    I wonder if you need the count query at all? If you use $wpdb->get_results(), and let WP handle the paging, you can use get_query_var(‘found_posts’) to get the count. This will have the added benefit that the query results will be cached so paging should be very fast.

    On another note, I don’t have any experience with the MySQL optimizer, but for Sybase, any query with more than three tables presented a problem that was often not optimized well. It sometimes helped to reorganize queries into subqueries of three tables or less, something like this:

    SELECT wposts.*
    FROM cc_posts wposts
    LEFT JOIN cc_postmeta cc_price ON ( wposts.ID = cc_price.post_id )
    WHERE cc_price.meta_kcy = 'price' AND cc_price.meta_value BETWEEN 4000 AND 8000
    AND cc_price.post_id IN
       (SELECT cc_make.post_id FROM cc_postmeta cc_make
        LEFT JOIN cc_postmeta cc_model ON ( cc_model.post_id = cc_make.post_id )
        LEFT JOIN cc_postmeta cc_year ON ( cc_make.post_id = cc_year.post_id )
    WHERE
       (cc_make.meta_key = 'make' AND cc_make.meta_value = 'Ford')
       AND (cc_model.meta_key = 'model' AND cc_model.meta_value = 'Focus')
       AND (cc_year.meta_key = 'year' AND cc_year.meta_value BETWEEN 2004 AND 2008 ))
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wposts.post_date < NOW( )
    ORDER BY (
    cc_price.meta_value +0
    ) ASC
Viewing 15 replies - 1 through 15 (of 23 total)
  • The topic ‘My inefficient SQL for custom searches..’ is closed to new replies.