• Resolved aldiaa

    (@aldiaa)


    Hi,

    I am new to PHP and WordPress world.

    I am trying to customize my search box in Twenty Fourteen Theme so that it ignores Arabic diacritics/accents. This means if users search for “???????” it should be considered as “????”.

    The default search query looks something like the below snippet:

    SELECT wp_posts.ID
    FROM wp_posts
    WHERE 1=1
      AND (((wp_posts.post_title LIKE '%keyword%')
            OR (wp_posts.post_content LIKE '%keyword%')))
      AND wp_posts.post_type = 'post'
      AND ((wp_posts.post_status = 'publish'))
    ORDER BY wp_posts.post_date DESC LIMIT 0,5

    I need to skip accents in both fields (post_title and post_content) in addition to “keyword”. I tried to add the below snippet to functions.php in my child theme, but for some reason I am getting a fatal error.

    function get_text_clear_of_diacritics($original_text){
    	return preg_replace("~[\x{064B}-\x{065B}]~u", "", $original_text);
    }
    
    function searchfilter($query) {
    if ($query->is_search) {
        $query->set('s', get_text_clear_of_diacritics($query['s']));
    }
    
    return $query;
    }
    
    add_filter('pre_get_posts','searchfilter');

    Your support would be appreciated.

    • This topic was modified 1 year, 10 months ago by aldiaa.
Viewing 5 replies - 1 through 5 (of 5 total)
  • Moderator bcworkz

    (@bcworkz)

    Correct the error this way:
    $query->set('s', get_text_clear_of_diacritics($query->get('s')));

    Doing so does mean the search will no longer match content that does have diacritics.

    ETA: Oh, “pre_get_posts” is an action, not a filter. The same process is used either way. It’s a semantic difference.

    • This reply was modified 1 year, 10 months ago by bcworkz.
    Thread Starter aldiaa

    (@aldiaa)

    Thank you for your reply dear @bcworkz, I thought I can access class fields directly by name without using “get” similar to other languages. Also, I was looking at some WP tutorials and found out the difference between filters and actions.

    However, my logic is not correct. What I am trying to do is to make the search insensitive in terms of diacritics. Users usually do not type diacritics when searching but some contents in my site contain these diacritics. So, I need to make MySQL considers “??????”, “??????”, “???” as one word.

    I found a good article in this topic, but I am not sure how to access Index.xml file in phpMyAdmin to add the required custom collation. As per the official MySQL documentations, the configuration files should be located in character_sets_dir (/usr/share/percona-server/charsets/), but I do not know how to navigate to these directories.

    Arabic Case Insensitive In Database Systems: How To Solve Alef With and Without Hamza Problem | by Ahmed Essam | Medium

    MySQL :: MySQL 5.7 Reference Manual :: 10.14.3 Adding a Simple Collation to an 8-Bit Character Set

    • This reply was modified 1 year, 10 months ago by aldiaa.
    Thread Starter aldiaa

    (@aldiaa)

    I am trying now to normalize both sides (MySQL and the search term written in the search textbox) but with no luck. This is my code in functions.php.

    add_action('pre_get_posts','searchfilter');
    
    function searchfilter($query) {
    
    if ($query->is_search) {
    	get_custom_search_query($query);
    }
    
    return $query;
    }
    
    function get_custom_search_query($query){
    	$search_term = get_text_clear_of_diacritics($query->get('s'));
    	
    	$sql = "SELECT * FROM _DDV_posts WHERE 1=1 ";
    	$sql .= "AND (((" . get_normalized_field(_DDV_posts.post_title);
    	$sql .= " LIKE '%" . $search_term . "%') ";	
    	$sql .= "OR (" . get_normalized_field(_DDV_posts.post_content);
    	$sql .= " LIKE '%" . $search_term . "%'))) ";
    	$sql .= "AND _DDV_posts.post_type = 'post' ";
    	$sql .= "AND ((_DDV_posts.post_status = 'publish')) ";
    	$sql .= "ORDER BY _DDV_posts.post_date DESC ";
    	
    	$query->get_results($sql);
    	//echo  'Last Query: ' . $query->request;
    }
    
    function get_text_clear_of_diacritics($original_text){
    	return preg_replace("~[\x{064B}-\x{065B}]~u", "", $original_text);
    }
    
    function get_normalized_field($column_name){
    	return 'REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HEX(REPLACE('
    	. $column_name .
    		', "-", "")), "D98E", ""), "D98B", ""), "D98F", ""), "D98C",""),"D991",""),"D992",""),"D990",""),"D98D","")';
    }

    The result query is for some reason still showing diacritics.

    SELECT SQL_CALC_FOUND_ROWS _DDV_posts.ID
    FROM _DDV_posts
    WHERE 1=1
    AND (((_DDV_posts.post_title LIKE '%??????%')
    OR (_DDV_posts.post_excerpt LIKE '%??????%')
    OR (_DDV_posts.post_content LIKE '%??????%')))
    AND ((_DDV_posts.post_type = 'post'
    AND (_DDV_posts.post_status = 'publish'
    OR _DDV_posts.post_status = 'private'))
    OR (_DDV_posts.post_type = 'page'
    AND (_DDV_posts.post_status = 'publish'
    OR _DDV_posts.post_status = 'private'))
    OR (_DDV_posts.post_type = 'attachment'
    AND (_DDV_posts.post_status = 'publish'
    OR _DDV_posts.post_status = 'private'))
    OR (_DDV_posts.post_type = 'e-landing-page'
    AND (_DDV_posts.post_status = 'publish'
    OR _DDV_posts.post_status = 'private'))
    OR (_DDV_posts.post_type = 'sr_playlist'
    AND (_DDV_posts.post_status = 'publish'
    OR _DDV_posts.post_status = 'private')))
    ORDER BY _DDV_posts.post_title LIKE '%??????%' DESC, _DDV_posts.post_date DESC
    LIMIT 0, 20

    It seems that my custom query is not being executed at all, it is being overwritten by the default search query. Why?

    Edit: if I try to use the global $wpdb, it is NULL.

    • This reply was modified 1 year, 10 months ago by aldiaa.
    • This reply was modified 1 year, 10 months ago by aldiaa.
    Moderator bcworkz

    (@bcworkz)

    I don’t know much about DB configuration. I’m not sure how much is safe to change. WP expects a certain configuration for certain aspects. Other aspects it doesn’t care about and you can do what ever you like. It may not be necessary to alter collation. You can use SQL wildcard characters to allow SQL to match any number of characters for one. There is more to SQL wildcards than the % we always see.

    You could compose a search term similar to %?[?????]?%. I don’t know Arabic, I may have butchered that, but hopefully you get the concept anyway. It’ll take some effort to compose a data array of possible variations for each specific character, but it only needs to be done once. You then need code to replace each applicable character with a wildcard set of allowable substitutions.

    I’m not sure why global $wpdb is null. It may be moot since you can manage the above all in “pre_get_posts”. Even if you need a custom SQL query, you still don’t need a valid $wpdb object. You can force WP_Query to utilize any SQL you want through the “posts_request” filter.

    For the sake of more information, if all else failed, you could create your own wpdb class object, which would make a new connection to the DB for you to use. Or rely upon PHP mysqli*() functions to make DB queries.

    Thread Starter aldiaa

    (@aldiaa)

    Thank you @bcworkz, “posts_request” filter was the answer! I just had to fix some syntax issues in my SQL query (missing quotes) and use this magical filter ??

    Note: I do not think using SQL wildcards would be a good idea. Each Arabic character has nine forms, so replacing each one by nine characters would result to a long ugly query. For example, “???” is a short word but it will be converted to “[?????????????????][?????????????????][?????????????????]”. This might be easy to write dynamically through PHP, but I am not sure whether the performance would be affected.

    Below is my complete code, just for reference.

    add_filter("posts_request", "custom_search_filter", 10, 2);
    
    function custom_search_filter($sql, $query) {
    
    if ($query->is_main_query() && is_search()) {
       $sql = get_custom_search_query($query);
    }
    
    return $sql;
    }
    
    function get_custom_search_query($query){
    	$search_term = get_text_clear_of_diacritics($query->get('s'));
    	
    	$sql = "SELECT * FROM _DDV_posts WHERE 1=1 ";
    	$sql .= "AND (((" . get_normalized_field('_DDV_posts.post_title');
    	$sql .= " LIKE '%" . $search_term . "%') ";	
    	$sql .= "OR (" . get_normalized_field('_DDV_posts.post_content');
    	$sql .= " LIKE '%" . $search_term . "%'))) ";
    	$sql .= "AND _DDV_posts.post_type = 'post' ";
    	$sql .= "AND ((_DDV_posts.post_status = 'publish')) ";
    	$sql .= "ORDER BY _DDV_posts.post_date DESC ";
    	
    	return $sql;
    }
    
    function get_text_clear_of_diacritics($original_text){
    	return preg_replace("~[\x{064B}-\x{065B}]~u", "", $original_text);
    }
    
    function get_normalized_field($column_name){
    	return 'REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('
    	. $column_name .
    		', "D98E", ""), "D98B", ""), "D98F", ""), "D98C",""),"D991",""),"D992",""),"D990",""),"D98D",""),"?","")';
    }
    • This reply was modified 1 year, 10 months ago by aldiaa.
Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Custom Search Function – Ignore Diacritics’ is closed to new replies.