• I wish to transliterate all the input, in all forms of my website, globally and without exceptions from UNICODE into ANSI using the Roman alphabet.

    Imagine this:

    -User types something in a Unicode Alphabet inside the form field.
    –Clicks “SUBMIT”
    —Invisibly, in the back, a JavaScript fires and transliterates the form field, converting the Unicode letters into their corresponding Latin transliterates. I have a dictionary for the conversion.

    Does such a thing exist, and if it doesn’t, could you make such a thing?

Viewing 13 replies - 1 through 13 (of 13 total)
  • This question will be answered best by the developer of the forms plugin you’re using.

    Thread Starter apaunovski

    (@apaunovski)

    Thank you.

    I am looking for a universal solution… I wish I had a WordPress plugin, which could attach my Transliteration javaScript to any <TEXTAREA> fields on my website.

    So, that when users submit the form containing the <TEXTAREA>, the script will transliterate the contents of the <TextArea> and only after this will the form be submitted and the romanized Unicode Alphabet characters will enter the database in place of the user’s original input made in Unicode Alphabet symbols.

    There may not be a universal solution. Some forms plugins use the wp_posts table while others use their own table structure!

    Thread Starter apaunovski

    (@apaunovski)

    document.forms.all.onsubmit=function()
    {
        //Transliteration takes place
        
    
    };

    Will this work?

    Moderator bcworkz

    (@bcworkz)

    You could not universally apply the same script to any form and expect it to work. You could adapt your generic script to work with a specific form on a specific site. In general, add a listener to the submit button, transform the form data as desired, then have JS submit the form.

    Incidentally, WP has a similar PHP function: remove_accents(). It’s used to compose post slugs. It’s possible to do this server side in PHP.

    You’re likely aware, but it’s worth pointing out that the saved data is still UTF-8 encoded even if it is transformed to basic Latin chars. Actually changing the encoding to ANSI or whatever is inadvisable.

    Depends on the code you put in the function. Without knowing which forms plugins you want your solution to work with we won’t be able to help you because we don’t know where the data is stored in the database.

    Thread Starter apaunovski

    (@apaunovski)

    Actually changing the encoding to ANSI or whatever is inadvisable.

    Yes, I know that an ANSI database will have only 256 characters to work with…

    That’s the point.

    I learned that ANSI characters take probably 0.5 – 1 byte. And Unicode is a bit more complex, on the other hand.

    Therefore, it makes sense to me to do the following:

    1. Let the user enter Unicode on the Front-End.
    2. onSubmit, a JavaScript transforms the User’s input to Romanized characters.
    3. The input from above enters the database as Roman letters and symbols.

    When a user opens the website

    1. WordPress pulls from the database the Romanized characters
    2. JavaScript runs in the browser (I have it) and translates the Front-End of the site back to the native Unicode Alphabet script of my target users.

    My users know nothing about this. To them, the site functions normally, in their native tongue.

    To me, the ANSI database is way faster and smaller and less resource-intensive.

    I transfer basically the Unicode overhead to the users’ browsers through the use of Transliteration and JavaScript.

    What do you think?

    Thread Starter apaunovski

    (@apaunovski)

    Depends on the code you put in the function. Without knowing which forms plugins you want your solution to work with we won’t be able to help you because we don’t know where the data is stored in the database.

    Why would you ever work with the DB itself?

    The point is that the JavaScript, which runs in the user’s browser, will intercept the <input> of the user BEFORE it enters the database. So, we shouldn’t care where the input goes once it has been transliterated by the browser prior to entering the DB.

    The point is through the use of JavaScript, to transfer the overhead to the user. Otherwise, it doesn’t make sense.

    I’m not suggesting you’d work with the data after it’s saved but you do need to know where the data is saved so that you can the save it to right tables don’t you?

    Thread Starter apaunovski

    (@apaunovski)

    Ah, you are talking about a PHP function.

    I am talking about pure JavaScript running in the user’s browser, replacing characters.

    This is the schema. ANSI database conversion schema

    function transliterate(word){
    
        var answer = "";
        var a = {}
    
        a["Ё"]="YO";a["Й"]="I";a["Ц"]="TS";a["У"]="U";a["К"]="K";a["Е"]="E";a["Н"]="N";a["Г"]="G";a["Ш"]="SH";a["Щ"]="SCH";a["З"]="Z";a["Х"]="H";a["Ъ"]="'";
        a["ё"]="yo";a["й"]="i";a["ц"]="ts";a["у"]="u";a["к"]="k";a["е"]="e";a["н"]="n";a["г"]="g";a["ш"]="sh";a["щ"]="sch";a["з"]="z";a["х"]="h";a["ъ"]="'";
        a["Ф"]="F";a["Ы"]="I";a["В"]="V";a["А"]="a";a["П"]="P";a["Р"]="R";a["О"]="O";a["Л"]="L";a["Д"]="D";a["Ж"]="ZH";a["Э"]="E";
        a["ф"]="f";a["ы"]="i";a["в"]="v";a["а"]="a";a["п"]="p";a["р"]="r";a["о"]="o";a["л"]="l";a["д"]="d";a["ж"]="zh";a["э"]="e";
        a["Я"]="Ya";a["Ч"]="CH";a["С"]="S";a["М"]="M";a["И"]="I";a["Т"]="T";a["Ь"]="'";a["Б"]="B";a["Ю"]="YU";
        a["я"]="ya";a["ч"]="ch";a["с"]="s";a["м"]="m";a["и"]="i";a["т"]="t";a["ь"]="'";a["б"]="b";a["ю"]="yu";
    
        for (i = 0; i < word.length; ++i){
    
            answer += a[word[i]] === undefined ? word[i] : a[word[i]];
        }   
        return answer;
    }

    Pure character replacement taking place entirely in the user’s browser.

    Ah ha. I misunderstood that you just wanted something to run on the front end before data is saved. My bad.

    Dion

    (@diondesigns)

    When a user opens the website

    1. WordPress pulls from the database the Romanized characters
    2. JavaScript runs in the browser (I have it) and translates the Front-End of the site back to the native Unicode Alphabet script of my target users.

    Looking solely at your Cyrillic translation array, let’s say the letter “I” is pulled from the database. Should that be translated as “Й”, “И”, “Ы”, or something else? To get the correct translation, you must also store some sort of pointer. But…at that point you may as well have stored the character as UTF-8 since you’re only saving one byte, and the performance hit of storing/retrieving a pointer will likely be greater than any perceived performance increase of using a Latin/ASCII character set.

    I say “perceived” because databases use collations that minimize the performance issues you are attempting to avoid. And don’t forget that by messing with the stored characters, performing searches on the text becomes a nightmare.

    Moderator bcworkz

    (@bcworkz)

    Relatively speaking, textual information just doesn’t take up that much space, even at 2+ bytes per char. You could write all week and not equal the data volume of a single modest image. You should spend your time identifying the true bottle necks of your site and optimizing those rather than worry about implementing an outdated encoding scheme. I’ve no doubt that the resources spent storing and transmitting textual data is not what needs to be optimized on your site to improve performance. IMO you’re better off ensuring images are well optimized than fussing with alternative encoding schemes.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Conversion of Form Input into ANSI in the background before SQL entry’ is closed to new replies.