• Resolved Manuel Sailer

    (@msailer)


    Hi,

    while trying to customize search results I found the following problem:
    Not all occurrences of keywords in search results are highlighted correctly.
    You can find a screenshot of 4 search results that show what I mean here.

    The search keyword is Stiftskeller with nothing else before or after.
    It is correctly highlighted in search results 1 and 4, but not in 2 and 3.
    As you can see, I managed to enable highlighting in the title by replacing the_title by relevanssi_the_title in the template. Changing the snippet length also works.
    All search results are custom post types while search results 1, 2 and 4 are one type and search result 3 is another.
    In search results 1 and 4 the correctly highlighted keyword is in the post excerpt. The not correctly highlighted keyword in search result 2 is in a paragraph block in the post content and in search result 3 in the excerpt of a related post linked by a display-posts shortcode.
    I also checked some more search results (not visible in the screenshot) where the keyword is correctly highlighted in the post title, a paragraph generated by a lazy block and the title of a related post also linked by a display-posts shortcode.

    As you can see, it is really confusing and I can not see the rule behind it.
    But what I can say: searching again and rebuilding the index does not change anything.

    Any ideas what I can do or even try to get correctly highlighted keywords in the search results?

Viewing 11 replies - 1 through 11 (of 11 total)
  • Plugin Author Mikko Saari

    (@msaari)

    I don’t know – I’d need to see more context. Seeing just one word of the excerpt, highlighted or not, isn’t helping much. Do you get highlights on other words on those problem posts, for example? Can you get Relevanssi to highlight the next word, or the previous one? I’d need to see at least the full excerpt, in order to figure out why the highlight isn’t picking up there.

    There’s little to do about this. You can try toggling the “Expand highlights to cover full words” setting, that affects how the highlighting works and may work better the other way around, or not.

    Thread Starter Manuel Sailer

    (@msailer)

    Hi Mikko,

    thank you for your quick reply. Sorry that I did not provide enough information but I did not expect that the context could be relevant for highlighting a keyword.

    I prepared some more screenshots without pixelization now. I did separate searches for keywords in the surrounding context of Stiftskeller and focussed on two search results by removing all other ones.

    You can find the screenshots here (keyword Stiftskeller), here (keyword historischen) and here (keyword Beutelsbach).

    I also tried enabling “Expand highlights to cover full words” but it makes no difference. Highlighting stays exactly the same as in the screenshots provided.

    Plugin Author Mikko Saari

    (@msaari)

    There’s nothing in that excerpt in itself that makes Relevanssi not work – I copied the text to my test site and the word “Stiftskeller” is highlighted without issues.

    What does the Weiblinger Kreiszeitung post content look like in the wp_posts database table?

    Thread Starter Manuel Sailer

    (@msailer)

    The word “Stiftskeller” in the “Waiblinger Kreiszeitung” search result is not part of this posts content. It is in the excerpt of the “April – Mai 2016” post (first search result) which is a related article.

    This is what is in the post_excerpt field of that post’s database entry:
    Bereits zum zweiten Mal hat der KUNSTRAUM WEINSTADT die M?glichkeit, Werke seiner Künsterinnen und Künstler im historischen Stiftskeller in Weinstadt-Beutelsbach auszustellen. Das Motto dieses Mal: ?Farbe leben“. Und bunt wird sie sein, diese Ausstellung, nicht nur an Farben, sondern auch an dadurch transportierten Emotionen und Stimmungen. Werke in Aquarell, Acryl, ?l und Mischtechnik h?ngen an rohen Mauern oder Stellw?nden, Encaustic, auch in Verbindung mit Schellack, und Fotodruck sind ausgestellt.

    And this is the source code of the Relevanssi excerpt:
    <p><span class="excerpt_part">? ZVW – Waib-lin-ger Kreis-zei-tung Aus-stel-lung Mehr Infor-ma-tio-nen zu und Impres-sio-nen von der im Pres-se-be-richt beschrie-be-nen Aus-stel-lung fin-den Sie&nbsp;hier:&nbsp; M?rz – April 2016: ?Far-be leben“ im Stifts-kel-ler Beu-tels-bach Bereits zum zwei-ten Mal hat der KUNSTRAUM WEINSTADT die M?g-lich-keit, Wer-ke sei-ner Küns-ter-in-nen und Künst-ler im his-to-ri-schen Stifts-kel-ler in Wein---stadt-Beu---tel-s---bach aus-zu-stel-len. Das Mot-to die-ses Mal: ?Far-be leben“. Und bunt wird&nbsp;sie…</span></p>

    There is absolutely nothing special in the database or the Relevanssi excerpt that might break a
    $content = str_replace('Stiftskeller', '<span style="color: #ff0000">Stiftskeller</span>', $content);
    or whatever you might be doing to highlight keywords.

    For a short moment I thought it might be a problem whether the keyword is in the title or the excerpt of a related article. But this is not true.

    I proved that with another example here. The search phrase in this example is erleben and only the last occurrence is highlighted correctly. The first occurrence (first row of the Relevanssi excerpt) is in the title of a related article and the second occurrence (last row of the Relevanssi excerpt) is in the excerpt of the related article – exactly as the correctly highlighted keyword at the end of the Relevanssi excerpt.

    This is the source code of the Relevanssi excerpt:
    <p><span class="excerpt_part">…zu und Impres-sio-nen von der im Pres-se-be-richt beschrie-be-nen Aus-stel-lung fin-den Sie&nbsp;hier:&nbsp; Febru-ar – M?rz 2019: ?Grün erle-ben“ im Rat-haus Beu-tels-bach Die-se Aus-stel-lung ist unse-re Auf-takt-ver-an-stal-tung im Jahr der Rem-s---tal-Gar---ten---schau und unse-re drit-te Aus-stel-lung im Rat-haus in Wein---stadt-Beu---tel-s---bach. Die herr-lich grü-ne Land-schaft des Rems-tals und das Gar---ten---schau-Mot---to ?unend-lich erle-ben” neh-men wir auf und ver-bin-den die-se zum The-ma unse-rer Aus-stel-lung ?Grün <span style="color: #ff0000">erle-ben</span>”.…</span></p>

    Plugin Author Mikko Saari

    (@msaari)

    Oh, yes there is! The text is chock full of soft hyphens, invisible to your eyes. To Relevanssi, it looks like this:

    &copy; ZVW &ndash; Waib&shy;lin&shy;ger Kreis&shy;zei&shy;tung Aus&shy;stel&shy;lung Mehr Infor&shy;ma&shy;tio&shy;nen zu und Impres&shy;sio&shy;nen von der im Pres&shy;se&shy;be&shy;richt beschrie&shy;be&shy;nen Aus&shy;stel&shy;lung fin&shy;den Sie&nbsp;hier:&nbsp; M&auml;rz &ndash; April 2016: &bdquo;Far&shy;be leben&ldquo; im Stifts&shy;kel&shy;ler Beu&shy;tels&shy;bach Bereits zum zwei&shy;ten Mal hat der KUNSTRAUM WEINSTADT die M&ouml;g&shy;lich&shy;keit, Wer&shy;ke sei&shy;ner K&uuml;ns&shy;ter&shy;i

    There’s no ‘stiftskeller’ there, just Stifts&shy;kel&shy;ler.

    Of course, Relevanssi should pay no attention to those soft hyphens. There’s some code in Relevanssi to remove soft hyphens, but it doesn’t catch all. I’ll have to fix this in the next version. If you want this fixed now, you can modify /lib/excerpts-highlights.php and find this line:

    $content = str_replace( '­', '', $content );

    Change that to:

    $content = strtr( $content, array( "\xC2\xAD" => '' ) );
    $content = str_replace( '­', '', $content );

    That should get rid of the soft hyphens and let Relevanssi highlight correctly.

    Thread Starter Manuel Sailer

    (@msailer)

    Hi Mikko,

    thank you very much for investigating this for me!

    What you write sounds absolutely logical. I just wonder, why I cannot see these soft hyphens even in source code view in Firefox, Chromium and Vivaldi.
    But I found the reason for them, it’s the wp-Typography plugin I mainly use to auto correct English quotation marks (“…”) and dashes (-) entered when typing to the correct alternatives in German ?…“ and —, but there is also auto-hyphenation that leads to all these soft hyphens. Will see whether I’ll test CSS hyphenation instead.

    I tried to apply the changes you suggested in excerpt-highlights.php but could not find the line to change – event with &shy; as the first parameter that is only visible (to me) when inspecting the source code of your answer above. The only &shy; I found in the Relevanssi code is in common.php where a $replacement_array is defined.
    But while searching for anything regarding hyphens I found your note in the changelog of version 2.4 telling that soft hyphens “still confuse the highlighting” and no later note telling that this was solved. ??

    So for the moment I will disable highlighting in search results and wait for the next version of Relevansssi. Of course I will try highlighting again and tell you when I still experience problems with that functionality.

    Thanks again and I hope solving this problem could help improving Relevanssi a little.

    Edit:
    Disabled hyphenation in wp-Typography and highlighting in search results works for all posts and pages without changing or saving them again.
    That means hyphenation takes place when the HTML output is generated, not while saving the post. Might it be possible to reorder the processing steps and execute Relevanssi’s highlighting before wp-Typography’s hyphenation?

    • This reply was modified 4 years ago by Manuel Sailer. Reason: Further testing after disabling hyphenation in wp-Typography
    Plugin Author Mikko Saari

    (@msaari)

    Well, being invisible is the key characteristic of the soft hyphen.

    Anyway, the next version is out now and should fix this for good.

    It would be better if wp-Typography runs after Relevanssi highlights the terms, but I don’t know how wp-Typography works and it was complicated enough that I couldn’t figure it out quickly, so I don’t know, maybe, maybe not.

    pepe

    (@pputzer)

    @msaari wp-Typography developer here: Does Relevanssi do the highlighting in PHP or in JavaScript? If the former, I can look whether there’s a way to do it with priority juggling on the filter hooks.

    Thread Starter Manuel Sailer

    (@msailer)

    @msaari
    Great thing! Highlighting is working now – even with hyphenation enabled in wp-Typography. Thanks for the quick fix and new release of Relevanssi.

    @pputzer
    In Relevanssi highlighting is done in PHP.
    I absolutely agree Mikko that wp-Typography’s hyphenation should be done after Relevanssi’s highlighting. From my point of view it would be best if hyphenation could be the last step before the generated HTML will be send to the browser. This would reduce impact not just on Relevanssi but any plugin that needs the content in its original state and would behave similar as if hyphenation would be done on the browser side using CSS.
    But as you wrote in our Disable hyphenation for tag or class topic hyphenation is done even before the content is filled into to the template. So maybe “priority juggling” might help.

    pepe

    (@pputzer)

    @msailer: It’s done when WordPress the_content and the_excerpt filter hooks run. So it’s run at the time the templates are rendered, but the elements in the templates themselves are not visible to WordPress plugins. There is no filter hook for the complete output of a WordPress page. It would be possible to add such a filter (or something close to it) using PHP output buffering, and I have been toying with the idea for some years now, but that would mean that all HTML would have to strictly conform to the standard used by our parser library, which is potentially different from what browser vendors actually support.

    wp-Typography does use a reasonably high priority (9999 by default, it can be adjusted via typo_filter_priority), but in general there needs to be “space” to have plugins with higher priority filters. NextGen Gallery uses PHP_INT_MAX - 1 as its priority, which can still be worked around, but only barely. Actually using PHP_INT_MAX would increase the likelihood of breaking things for someone to 1.

    Plugin Author Mikko Saari

    (@msaari)

    Relevanssi passes the excerpt content through the_content before the highlighting is added in the excerpt-making process, so adjusting the priority won’t help a bit. Removing the wp-Typography filter before Relevanssi applies the_content would help, but now that Relevanssi removes the soft hyphens, it does not make much practical difference. wp-Typography will still fire on the_excerpt to apply the hyphens to the excerpt Relevanssi has created, so everything should just work.

Viewing 11 replies - 1 through 11 (of 11 total)
  • The topic ‘Highlighting works only in some search results’ is closed to new replies.