• Resolved yoav.aner

    (@yoavaner)


    I was playing around with a plugin for an RTL language website. If I use RTL text in the glossary term, it messes up the page quite badly. I figured it relates to the length of the glossary term, and from there to the gt_get_len method. I saw that you’re using a check if the text is RTL and then apply mb_strlen without encoding. I’m not a PHP or unicode expert, but I think strlen should do the job without having to check for RTL? e.g. see https://stackoverflow.com/a/12046233

Viewing 15 replies - 1 through 15 (of 15 total)
  • Plugin Author Daniele Scasciafratte

    (@mte90)

    Hi, strlen doesn’t do the job.
    mb_strlen without any parameter set the encoding based on the server one enabled https://www.php.net/manual/en/function.mb-strlen.php

    The difference between the two is:

    *strlen works perfectly with ascii or latin encoding
    *mb_strlen works with emoji, hebrew, kanji, arab and the other languages (also mixed) where the characters in reality are built of various “hidden” characters to print a single symbol

    As the RTL languages requires as mandatory a check for multibyte symbols, we enforce in the code to avoid issues.
    To help you debug your issue I need a page where I can investigate what is happening as we have automated tests for the languages mentioned above for example.

    Thread Starter yoav.aner

    (@yoavaner)

    I tested with Hebrew and it’s broken. Changing gt_get_len to simply return strlen($stringtomatch) seems to fix it and works with English glossary terms or Hebrew.

    The site I’m working on isn’t live, but I tested with a glossary term with Hebrew, e.g. ?????????????????????? in the term, and then having two terms on a page, the first one pointing to the Hebrew one, and then second to another term (Hebrew/English), and the HTML gets broken as a result.

    Plugin Author Daniele Scasciafratte

    (@mte90)

    We have other customers using the plugin with the code as it is with Hebrew and we don’t have the issue you are reporting.
    I need a page with the issue to investigate, because the reasons are why aren’t working can be various, from plugin conflicts to a server configured differently.

    Thread Starter yoav.aner

    (@yoavaner)

    I can share the HTML output for now? but once you look at the HTML it’s quite clear that the bug is related to the length calculation. The HTML gets garbled based on the length of the term excerpt. Maybe it’s something on my system, but I’m not sure what it is. Changing to use strlen seems to work on my system with different languages. I can also check with other unicode characters.

    In general, I’m curious why the replace_with_utf_8 function needs length at all. Can this be simplified by using a template instead?

    • This reply was modified 9 months ago by yoav.aner.
    Plugin Author Daniele Scasciafratte

    (@mte90)

    There are various reasons why needs the length, as example with mixed encoding the calculation is different or with broken characters, with complex HTML and many others.

    I can try with the content (I need also a list of all the words found) but isn’t enough to check the issue.

    Thread Starter yoav.aner

    (@yoavaner)

    As I said, I’m not that familiar with PHP, but I imagine text substitution is more-or-less a solved problem. I imagine using some kind of templating?

    If I provide access to a page that is broken, what can you test more than looking at the HTML? or would you need admin/backend access?

    Thread Starter yoav.aner

    (@yoavaner)

    I just tested with Kanji and it seems to work ok, but not with Hebrew. The website is set to Hebrew/RTL.

    Plugin Author Daniele Scasciafratte

    (@mte90)

    I need probably an access but this kind of support it is offered only for premium customers, so I need that you share a link here.

    Thread Starter yoav.aner

    (@yoavaner)

    Not a Premium customer, but it looks like a bug to me, and I have a feeling the implementation can be simplified. Thanks so much for responding quickly, Daniele. Happy to share some HTML snippets if it helps.

    Plugin Author Daniele Scasciafratte

    (@mte90)

    I hope that can be simplified, but after years this was the only solution working for all the languages we met.
    you can share the HTML snippet anyway and I can do some tests.

    Thread Starter yoav.aner

    (@yoavaner)

    <div class="entry-content wp-block-post-content has-global-padding is-layout-constrained wp-block-post-content-is-layout-constrained"><p>Check if you can do <span class="glossary-tooltip glossary-term-299" tabindex="0"><span class="glossary-link"><a  target="_blank" class="glossary-only-link">ATerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">?????????????????????? <a >More</a></span></span></span> with the apartment</p>
    
    
    
    <<span class="glossary-tooltip glossary-term-301" tabindex="0"><span class="glossary-link"><a  target="_blank" class="glossary-only-link">BTerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is Term B <a >More</a></span></span></span>o check with the BTerm about facilities</p></div>
    
    <div class="entry-content wp-block-post-content has-global-padding is-layout-constrained wp-block-post-content-is-layout-constrained"><p>Check if you can do <span class="glossary-tooltip glossary-term-299" tabindex="0"><span class="glossary-link"><a  target="_blank" class="glossary-only-link">ATerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is A term <a >More</a></span></span></span> with the apartment</p>
    
    
    
    <p>Also check with the <span class="glossary-tooltip glossary-term-301" tabindex="0"><span class="glossary-link"><a  target="_blank" class="glossary-only-link">BTerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is Term B <a >More</a></span></span></span> about facilities</p></div>
    Plugin Author Daniele Scasciafratte

    (@mte90)

    So in this case we have the gutenberg editor with blocks with mixed content but the HTML like this is not helpful for me.
    I need the clean text and the words that mismatch to tests in our test suite.

    Thread Starter yoav.aner

    (@yoavaner)

    Can you point me to the test suite? If it’s easy enough to set up a test environment and run it, I can try to see if I can reproduce it on there.

    Plugin Author Daniele Scasciafratte

    (@mte90)

    We don’t share the tests to simplify the plugin deploy and people asking for support with that stuff (also because has stuff for the pro version).

    Closing after 2 weeks.

Viewing 15 replies - 1 through 15 (of 15 total)
  • The topic ‘strlen bug with RTL languages?’ is closed to new replies.