• Resolved ianatkins

    (@ianatkins)


    Hello.

    Thanks for the plugin – was working really well – but now seems to be very slow to index.

    Initially the first index was very fast pushing 90k in about 30 minutes. Now I’ve made changes to the shared attributes via the algolia_post_attachment_shared_attributes filter. I’ve ‘cleared’ the index via the Algolia dashbaord.

    Upon re-indexing via the ‘autocomplete’ page in the wp-admin area – I can see hundreds of individual ‘deleteObject’ API requests logged. These seem to be hitting the batch endpoint, but only deleting one record at a time, which is slow.

    This seems totally redundant – as the index could be cleared to remove all records?

    Tracking through the plugin – it appears to be the update_post_records() function in class-algolia-posts-index.php that checks if records are to be updated then deletes them.

    Is there a way to bypass this – or to run a faster resync, like when I originally installed the plugin.

    Thanks,

    Ian.

Viewing 7 replies - 1 through 7 (of 7 total)
  • Thread Starter ianatkins

    (@ianatkins)

    So have managed to get it batching quickly by bulk deleting the meta records with the name algolia_% – which is set from set_post_records_count

    Not sure if this is just me – but perhaps there should be a ‘drop index’ option that does this and re-indexes from a clean slate, or the individual delete statements could be optimised with a clear at the start of the process.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    hmm, processing and absorbing what we can here.

    Is there a way to bypass this – or to run a faster resync, like when I originally installed the plugin.

    Not completely sure what to say here as there haven’t been all that many changes to the process, if any at all lately, so why there’s a difference is hard to pinpoint. Note that this isn’t us trying to deflect anything, just brainstorming of what type of changes could contribute to speed difference.

    The biggest changes that I can think of would be what’s in the site itself. For example content volume, or changes to the content that was once fast index, but is now slower. There’s also always just server resources like memory etc, that may be getting more readily used compared to at first.

    I assume you’re not making use of the algolia_should_wait_on_delete_item which adds a brief delay in processing and makes things a bit more synchronous than asynchronous.

    I believe the records count gets stored to help make sure that all the appropriate parts are getting deleted. For example if I have a really long post that ends up being 4 records in Algolia, that count variable will help make sure when deleting, we get all 4 records.

    Thread Starter ianatkins

    (@ianatkins)

    Hey Michael.

    Thanks for the reply.

    To clarify, I’ve only just installed the plugin – so always running version 2.8.1

    Seems upon the initial install, and initial index the process is fast and in the Algolia logs, can just see batches of updateObject requests ( 100 records at a time ).

    When running the re-index again, looks like the check for ‘records’ is always true in class-algolia-searchable-posts-index.php – so on the Algolia logs I just see several deleteObject requests with a single ID at a time ( example below ). This takes a very long time to process with 90K records.

    Line where ‘records’ is checked:
    https://github.com/WebDevStudios/wp-search-with-algolia/blob/fe6fdbec5fe254c3f4e278729bb7921e0247290b/includes/indices/class-algolia-searchable-posts-index.php#L373

    {
    "requests": [
    {
    "action": "deleteObject",
    "body": {
    "objectID": "808-0"
    }
    }
    ]
    }

    If I delete the meta fields ( which makes that check in class-algolia-searchable-posts-index.php return false, then the index batches again and updates 100 records at a time.

    Yeah i’m not using algolia_should_wait_on_delete_item filter.

    Thanks,

    Ian.

    • This reply was modified 7 months, 2 weeks ago by ianatkins.
    • This reply was modified 7 months, 2 weeks ago by ianatkins.
    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Not sure what to say here, to be honest, outside of making sure it’s understood that 1 WP post doesn’t automatically equal 1 object in Algolia.

    {
    "requests": [
    {
    "action": "deleteObject",
    "body": {
    "objectID": "808-0"
    }
    }
    ]
    }

    This would be at minimum the first object/record, but if you had a really long post, you’d also find a 808-1 objectID, and perhaps 808-2 all of which would be objects associated with the post ID 808

    Looking at the code history for the class-algolia-searchable-posts-index.php file, this is how it was done when we originally forked the plugin from Algolia themselves. I am also pretty certain, based on my previous reply, that you’d want to keep the meta data, because that will help make sure, during re-indexing, that all of the objects/records get appropriately deleted, before indexing the new version of the WP post.

    Thread Starter ianatkins

    (@ianatkins)

    Hey Michael.

    Sorry let me try and clarify. When triggering a ‘re-indexing’ from the ‘autocomplete’ page, after an index has previously been created, the plugin individually deletes every record one by one. This is very slow compared to the first initial indexing when the plugin is freshly installed.

    I don’t understand why the plugin doesn’t just call the ‘clear’ method to delete the entire index – and then just push the records in batches.

    Perhaps the ‘reindex’ function ( and deleting an individual record ) is used ( and makes sense ) for the content syncing, but it doesn’t really make sense when manually re-indexing, where I think it’s safe to assume the whole index can be dropped as it is being remade.

    It looks like this how the re-indexing works via CLI – so just not sure why it’s not the same when triggered via WP-Admin:
    https://github.com/WebDevStudios/wp-search-with-algolia/blob/fe6fdbec5fe254c3f4e278729bb7921e0247290b/includes/class-algolia-cli.php#L118

    Hope that’s clearer!

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    I definitely understand and see where you’re coming from.

    At least from our perspective, it’s coming down to a case of this how Algolia themselves originally designed things, and because for the most part (with current situations aside), it “just works” enough that we’re not wanting to change too much, without some decently heavy testing and planning.

    Why they originally chose to do things certain ways, I couldn’t say. Looking back through their archived Github repo, it may have also been a performance reason to do things this way.

    Original ticket: https://github.com/algolia/algoliasearch-wordpress/issues/705

    Related commit: https://github.com/algolia/algoliasearch-wordpress/commit/32599ddfb1a1916e5ce313f5fe2884a8cd8299e7

    However, that also looks like it got rolled back as per https://github.com/algolia/algoliasearch-wordpress/issues/708

    with this commit. https://github.com/algolia/algoliasearch-wordpress/commit/6296fabc184ebfda04e35b570926e65e22566517

    Ultimately, what I’m trying to likely say is that if there’s room for us to improve everything, awesome, we can begin the process of looking into it. However, we have to trust best intent originally, as well as keep in mind that a change here could have wide effects for all of our users, so we can’t be hasty either. I know we’ve gone into things with some reported issues in the past and accidentally turned the asynchronous into synchronous actions and affected everyone’s performance negatively.

    That said, I have to believe the UI processing is a bit more tender caring re-indexing with possibly some parts that are getting run that I don’t even realize/know about (darn corners of the code), while WP-CLI may be more a swift hammer strike.

    Also just as a quick aside, if you’re working with the autocomplete indexes, the PHP class you’d be using is the includes/indices/class-algolia-posts-index.php file. This is because “Searchable posts” get their own index, which collects post data from that are registered as searchable. Meanwhile, everything in the autocomplete settings, except “All posts” will create their own index.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    So I’m still poking through the code a bit, because this has me thinking.

    The delete_item method, for example, should be collecting every part of a given post, and batch deleting in one request. So if a post has 4 record parts, all 4 should be getting removed at the same time.

    Now, this still doesn’t differentiate between deleting EVERYTHING all at once, vs working with one at a time. I believe all this same code gets run when saving an individual item. So we can’t have it run clear() by default. This would also explain why that is used with WP-CLI.

    So the question is how can we pass in instructions to use clear() potentially for the UI bulk re-index.

Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘Slow re-indexing due to individual deleteObject’ is closed to new replies.