Announcement

Collapse
No announcement yet.

Diacritic Character removal. Those strange non text characters...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Diacritic Character removal. Those strange non text characters...

    Has anyone ever come up with a universal way to removed all control and special characters from stings in Miva Script?

    Removing characters below char 32 and above char 127 does not work because some, but not all, of the codes are two bytes long.

    Every example I find uses Regex expressions or functions built into the language.

    Code:
    Here are JavaScript examples:
    
    const str = "Crème Brulée"
    str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
    >"Creme Brulee"
    
    Or the more modern version:
    str.normalize("NFD").replace(/\p{Diacritic}/gu, "")
    Last edited by RayYates; 07-27-21, 06:21 AM.
    Ray Yates
    "If I have seen further, it is by standing on the shoulders of giants."
    --- Sir Isaac Newton

    #2
    Just out of curiosity, why do you want to remove them? Any business that wants to support a language other than English may need them. Are they causing any specific problems?
    Kent Multer
    Magic Metal Productions
    http://TheMagicM.com
    * Web developer/designer
    * E-commerce and Miva
    * Author, The Official Miva Web Scripting Book -- available on-line:
    http://www.amazon.com/exec/obidos/IS...icmetalproducA

    Comment


      #3
      You could iterate the string and use the "isprint" builtin and remove non-printable characters that way. However that will not replace the characters with their closet "ascii" character. If the text is UTF-8 you should be able to detect that a multi-byte sequence and know how many bytes to remove too.
      David Carver
      Miva, Inc. | Software Developer

      Comment


        #4
        For Posterity, I've written a JavaScript function that converts unicode and diacritic characters to readable content.

        Code:
        function utoh(text, min = 127, max = 255) {
            /*     Ray Yates 3/2022
        
            utoh() = Unicode to Html
                Translate diacritic and unicode characters to printable characters.
                Diacritic like 'áàâäãéèëêíìïîóòöôõúùüûñçăşţ' etc.
                and Unicode like ©
        
            Given a string that contain special characters,
            converts characters to plain text or html code. (hex version)
            Examples:
                á becomes a, ç becomes c
                \u0092 becomes &#x92 ( displays the curled single quote ’ );
                \u00A9 becomes &#xA9 ( displays the copyright symbol © );
        
            Usage:
                1.    let domObject = document.querySelector("#tab-Products");
                    domObject.innerHTML = utoh("&mvt:product:descrip;");
        
                2.    $("#tab-Products").html( utoh(data.description) );
        
                3.    $("#tab-Products").html( utoh(data.description, 150, 160) );
                    Limit unicode characters to replace.
        
            Paramiters:
                text: the string to clean up.
                min, max: optional unicode characters to search for.
                If omitted, defaults set to 127, 255
        
            See: https://www.htmlsymbols.xyz/unicode for unicode character set.
            */
        
            let norm_text = text.normalize("NFKD");
            for (let index = min; index <= max; index++) {
                norm_text = norm_text.replaceAll( String.fromCodePoint(index), `&#x${index.toString(16).toUpperCase()};` );
            }
            return norm_text;
        }
        Last edited by RayYates; 03-30-22, 07:48 AM.
        Ray Yates
        "If I have seen further, it is by standing on the shoulders of giants."
        --- Sir Isaac Newton

        Comment

        Working...
        X