Viewing 15 replies - 1 through 15 (of 52 total)
  • Plugin Author Epsiloncool

    (@epsiloncool)

    Hi @rfischmann

    Thank you for this report.

    In fact, we changed a lot in the algorithm in the new version 1.65, so despite deep testing, shortcomings are not excluded.

    I would advise you to start by opening the publication with ID 984018 for editing via Edit Post. And just save it. This forces the update of the publication index. After that, try the search again (in Sandbox you can).

    After that, write to me to see if the publication has appeared in the index. This step is for verification purposes, of course, I don’t want you to search for all non-indexed publications and manually index them. I need this to understand what’s going on.

    I would also ask you to indicate your PHP version, the amount of script memory from php.ini and the current allowed time_limit. You can view this information in Tools/Website Health.

    Thank you!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Hi, @epsiloncool!

    I’ve edited post ID 984018 and yes, now it shows up on top: https://d.pr/i/20lJe0

    I have no idea at all how many other posts would be affected, of course.

    We’re running under PHP 8.2.18, 512MB memory limit, 300 seconds time limit.

    Cheers!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Every Sunday, I save a weekly backup of our MySQL database and it’s always growing up by a few MBs. This time, it fell from 928MB to 679MB, and I’m sure it’s related to WP Fast Total Search’s latest update.

    Its index database is now either very optimized or smaller, or it really didn’t recreate a full index of our posts.

    Plugin Author Epsiloncool

    (@epsiloncool)

    Hi @rfischmann

    Actually, the new indexing rule removed “revisions” from index. Yes, it makes the index smaller. You can check how much “revision” records were removed from the index on the “Indexing Engine Settings / Indexing Rules”, there is a message like “We also found?N?records that are not subject to any rule (they will have empty data in the index).” You can compare how much records were removed from the index and compare with backup size change.

    According to missed posts I would like to ask you to find some more posts that are not searchable (but have to be searchable). If you find them, I would ask you to check some database records then (or ask you to make an Anydesk session for me so I can check some things). It’s really important, because your case can be either single case or repeatable problem.

    Yesterday I made some more checks for indexing and still was unable to repeat your problem.

    Thanks.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    This, @epsiloncool? https://d.pr/i/FRfv0f

    It does make total sense not to include revisions from the index, good idea!

    That post that wasn’t correctly indexed was identified by us simply because we were looking for it specifically. I haven’t noticed other examples up until now, but I’ll run some tests and see if I can identify any others.

    As for the “new” search results’ order, look at this example query: https://d.pr/i/OLWydu

    Before the update, it would show “Os 5 artigos mais lidos no MacMagazine: de 12 a 19 de maio” first, then “Os 5 artigos mais lidos no MacMagazine: de 5 a 12 de maio” second, which both have the exact query in their title AND are the two latest/most recent posts. The first result there, “Retrospectiva 2009: os dez posts mais acessados no MacMagazine durante o ano”, doesn’t even have “mais lidos” in its title — which seems very weird to me.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Another example that I don’t understand the search results’ order at all. Here is our podcast’s tag: https://macmagazine.com.br/categoria/podcast/

    If you run a search for “MacMagazine no Ar”, it will bring results in what seems to be a total random order. I’d expect at least the latest podcasts to be on top of the results.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    It’s really hard to find a specific post that might have not been indexed out of nowhere.

    One idea: where can I see how many posts has WP Fast Total Search indexed? Then, I can compare that number to the number of published posts we have.

    Plugin Author Epsiloncool

    (@epsiloncool)

    The total posts (wp_posts records) processed and indexed is displayed here

    But to say which posts has any index data and which are not, we need to make some DB queries I think.

    Let me compose one and I will send you so we can make fast check

    Thread Starter Rafael Fischmann

    (@rfischmann)

    That number (307815 right now, in our case) doesn’t make much sense to me. We have 82,677 published posts…

    Plugin Author Epsiloncool

    (@epsiloncool)

    Hi @rfischmann

    Actually, WP and plugins store almost everything in the wp_posts table. Not only pages and posts but a lot of different records too. For example menu items, service information, custom post types etc etc

    WPFTS tries to index all that info, fortunately most of those additional records are short and does not give any sensible index size increment.

    Okay

    Well, could we try to make a fast check of your database? I have constructed the MySQL query for this, and would ask you to run it in your phpMyAdmin to detect those posts that were indexed wrongly or not indexed.

    select * from
    (select 
    	p.ID post_id,
    	wi.id windex_id,
    	p.post_title,
    	p.post_type,
    	p.post_status,
    	wlog.status,
    	
    	wd_t.n wdt_n,
    	dd_t.n_total wdt_total,
    	dd_t.n_nw wdt_nw,
    	length(p.post_title) title_len,
    	(length(p.post_title) / wd_t.n) av_t,
    	
    	wd_c.n wdc_n,
    	dd_c.n_total wdc_total,
    	dd_c.n_nw wdc_nw,
    	length(p.post_content) content_len,
    	(length(p.post_content) / wd_c.n) av_c,
    	
    	(if(isnull(wi.id),128,0) +
    	if(wlog.status != 3,64,0) +
    	if((dd_t.n_nw != dd_t.n_total) or (dd_t.n_total != wd_t.n) or (dd_c.n_nw != dd_c.n_total) or (dd_c.n_total != wd_c.n), 32, 0) +
    	if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) +
    	if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level
    	
    from wp_posts p
    left join wpftsi_index wi
    	on p.ID = wi.tid and wi.tsrc = "wp_posts"
    left join wpftsi_docs wd_t
    	on wd_t.token = "post_title" and wd_t.index_id = wi.id
    left join wpftsi_docs wd_c
    	on wd_c.token = "post_content" and wd_c.index_id = wi.id
    left join wpftsi_ilog wlog
    	on wlog.index_id = wi.id
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_t
    	on dd_t.did = wd_t.id
    	
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_c
    	on dd_c.did = wd_c.id
    ) tt	
    where
    	tt.warn_level > 0
    order by tt.warn_level desc
    	
    limit 1000

    This MySQL looks scary, but it does simple things: it finds some key data values in WPFTS index and in case something suspicious, it will show the post record in the result with extended data.

    The main column to check is “warn_level”. The value there means how worse the data we found.

    128 means “post does not have any index record at all”

    64 means “indexing was broken because of internal error, status not equal to 3”

    32 means “calculated number of words in the document does not equal to actually stored number in index”

    16 means “post title or post content has some text data, but number of indexed words is 0”

    We mainly interested for records with warn_level greater than 16. Sometimes (often) wp_posts records may have warn_level = 16 because title has only special characters (those are not indexed normally) or content have only images (those not indexed too).

    If you found some posts with warn_level = 16, you can check them by post_id in WP Admin to see whether they have indexable text info or not.

    Ideally if you can send me CSV or screenshot of this MySQL query result.

    P.S. You may need to change “wp_posts” value in the query to actual name of your wp_posts DB table (sometimes wp_ prefix may be different).

    Thanks.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Understood about the wp_posts table, @epsiloncool, thanks!

    The MySQL query took almost 2 minutes to run and it seems to have generated up to the 1,000 results’ limit. I think there are lots of “oembed_cache” lines that are useless, I’m not sure.

    Here’s the CSV file: https://d.pr/f/oNV1lK

    Plugin Author Epsiloncool

    (@epsiloncool)

    Great, I see nothing too bad, but we can check what happen with some posts.

    For example in the row 9 we can see post_id = 56355. According to CSV data, this post has big content size (4301 characters), but NO indexed words.

    Could you open any Post in Edit Post mode, please, and then replace post_id in the URL to 56355 so you can see Edit Post page for this post. (“Criado para explorar o coprocessador M7 de iPhones 5s….”)

    What can you say about it’s post content? Why it’s not indexed? In case it contains only images this it OK. But in case it has some text, it’s a problem.

    Let me know, thanks!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Got it.

    That’s really weird, @epsiloncool. That post_id is actually an image, it seems? https://d.pr/i/7NuAv1

    The “Criado para explorar o coprocessador M7 de iPhones 5s, Nike+ Move já está na App Store” post’s ID is actually 419985, as far as I checked directly in its page: view-source:https://macmagazine.com.br/post/2013/11/10/criado-para-explorar-o-coprocessador-m7-de-iphones-5s-nike-move-ja-esta-na-app-store/

    I’m a bit confused.

    Plugin Author Epsiloncool

    (@epsiloncool)

    Yes, sorry. I said you windex_id, not post_id.

    Actual post_id was 419985.

    So as I can see on the post page it has a lot of texts, but no words in the index. That’s bad.

    Do you see post_content in Sandbox tester for ID = 419985 ?

    Thread Starter Rafael Fischmann

    (@rfischmann)

    No content at all: https://d.pr/i/ZKzSQG

    I opened that post in the editor and what I’ve noticed is that, as it’s from 2013, it still uses WordPress’ old/classic editor: https://d.pr/i/RWUjLJ

    I made a small edit, first without converting to blocks, and that seems to be enough to index it: https://d.pr/i/QMLz6Y

    I then converted it to blocks, didn’t seem to make much difference to its indexation: https://d.pr/i/sGoxMC

Viewing 15 replies - 1 through 15 (of 52 total)
  • The topic ‘Post missing, search results’ order’ is closed to new replies.