• Hello!

    Is there any way to ignore Arabic diacritics when highlighting?
    I’m not sure if it will help out but I’ve used this code to remove it for some other things, I’m just not sure how to use it in this case:

    
    function arabic_remap( $a ) {
        $remap = array(
            '?' => '?',
            '?' => '?',
            '?' => '?',
            '?' => '?',
            '?' => '?',
            '?' => '?',
            '?' => '',
            '?' => '?',
        );
    
        $diacritics = array(
            '~[\x{0600}-\x{061F}]~u',
            '~[\x{063B}-\x{063F}]~u',
            '~[\x{064B}-\x{065E}]~u',
            '~[\x{066A}-\x{06FF}]~u',
        );
    
        $a = preg_replace( $diacritics, '', $a );
        $a = str_replace( array_keys( $remap ), array_values( $remap ), $a );
    
        return $a;

    Is there any way you could fix this?

    Thanks

Viewing 5 replies - 1 through 5 (of 5 total)
  • Hi @khakan, the plugin does “remapping” in the javascript file hlst-extend.js. You can find the map from line 134 var charToAccentedCharClassMap = {...}

    Please note: if you edit this file nothing will change because the minified version hlst-extend.min.js is the one used on the front end. So if you wish to test any modifications yourself, copy the complete content from hlst-extend.js over to the hlst-extend.min.js file first. Or you can enable the WP_DEBUG flag, which will make the plugin use the unminified script file.

    Anyway, I’m not sure how to integrate these Arabic cases though…

    I tried adding this on line 161 (before the “Maps for Russian..” part):

    
    /*
     * Arabic. Thanks @khakan
     */
    ''       : '[\u0600-\u061f\u063b-\u063f\u064b-\u065e\u066a-\u06ff]', // remove these diacritics, normalize the rest.
    '\u0627' : '[\u0627\u0622\u0623\u0625\u08ad]',
    '\u0649' : '[\u0649\u0626]', // what about https://unicode-table.com/en/064A/ and https://unicode-table.com/en/06E6/ ?
    '\u0647' : '[\u0647\u0629]',
    '\u0648' : '[\u0648\u0624]', // what about https://unicode-table.com/en/06E5/ ?
    // no idea what to do with '?' => '' ...
    

    but it does not seem to be working. Or rather: I have a hard time testing this because I do not read or write Arabic.

    I used https://unicode-table.com/en/#arabic to find the unicode notations for these characters but not sure if correct. If you have any ideas, please do share them ??

    Thread Starter khakan

    (@khakan)

    Thank you for your quick reply!

    So I tried this code out and the last 4 lines are working perfectly, but the first one is still not working. Most of the characters in the groups are extra characters that don’t get used that often anyway. The main ones I need to get remapped are from ‘\u064B to \u0652’. These are accent characters in Arabic that tell you the way a letter should be pronounced. Accents in Arabic are separate Unicode characters. I tried only using them instead of the groups in the first line of the code but it doesn’t seem to work.

    As for the characters “https://unicode-table.com/en/064A/”, “https://unicode-table.com/en/06E6/” and “https://unicode-table.com/en/06E5/”, they don’t need to be remapped because they are just normal letters.

    I hope you can help me out with this problem.

    • This reply was modified 3 years, 5 months ago by khakan.

    I just realized that some \uXXXX\uXXXX paris may represent combination characters which would break the regular expression, or at least make it unpredictable.

    Could you try separating them with the OR symbol like this:

    
    ''       : '[\u0600-\u061f|\u063b-\u063f|\u064b-\u065e|\u066a-\u06ff]', // remove these diacritics, normalize the rest.
    '\u0627' : '[\u0627|\u0622|\u0623|\u0625|\u08ad]',
    '\u0649' : '[\u0649|\u0626]',
    '\u0647' : '[\u0647|\u0629]',
    '\u0648' : '[\u0648|\u0624]',
    

    Hi @khakan, the code has been integrated into version 1.6 even though it may only partially work. Let me know if you find any issues or new code suggestions to improve Arabic compatibility ??

    Thread Starter khakan

    (@khakan)

    Hello RavanH, I’m sorry for the late reply. Thank you for adding it to your plugin! I will try it out see if everything is working now.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Ignoring Arabic diacritics’ is closed to new replies.