• Resolved Blutarsky

    (@blutarsky)


    I have a tool that queries the whole DB to check duplicate content issues. post_title and post_content are checked.
    Quite slow.
    I was wondering if it makes any sense to add some secondary indexes on those columns… could it seepd up the game?
    Those indexes may get huge, after a while..

    What do you think?

Viewing 6 replies - 1 through 6 (of 6 total)
  • I assume your tool has to process the contents of each post_content field in each record in the table. If the tool is reading every row in the table in the first place, an index is not going to speed that up.

    an index speeds things up when you are looking for a small number of records in a large table, not when reading every record in a table.

    Thread Starter Blutarsky

    (@blutarsky)

    I’m looking for duplicates in this way:
    ********************************************
    (pseudocode)
    select posts //query
    where post_title = findtitle and
    post_content = post_content

    if found
    dosomething
    ************************************
    In such a scenario would it help to index title and content?
    What space usage should I expect?

    you could try creating indexes while you are running the query & see if it speeds it up. have to use full text index on post_content field

    you might consider (pseudocode)
    select post_title, post_content, count(*) from wp_posts
    group by 1, 2 having count(*) > 1

    that will give you a list of posts where more than one post has identical title and content, and a count of how many there are, on each line. it won’t show ID but you can search for that afterwards.

    you have to allow for post revisions which sometimes have identical content – that would add to the where clause
    where post_type != ‘revision’ (I’m not positive of the field name or wording of revision on that – double check before running)

    Thread Starter Blutarsky

    (@blutarsky)

    What you mean ” to create an index while running”?
    I thought index creation could occur only behind the scene, using phpmyadmin…
    could you clarify?

    my point was no harm is done in creating them. create indexes before you run your procedure, run your procedure. if you are only running this once they are not needed after you are finished so delete them later.

    If you will run from time to time, you know how long it took to run without indexes. so create the indexes. if they speed it up, keep them. if not, delete them.

    Thread Starter Blutarsky

    (@blutarsky)

    Ok, thanks!

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Indexing other WordPress DB columns…’ is closed to new replies.