Announcement

Collapse
No announcement yet.

Character Encoding and Non-ASCII Characters - What to Do?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Character Encoding and Non-ASCII Characters - What to Do?

    Extended ASCII, special characters, non-ASCII characters, whatever you want to call them, they're proving to be a pain with our store. Our store deals in domestics and bedding, and as such we [ab]use the word "décor" a lot.

    Note the e accent aigu there. As far as character encoding and data integrity goes, that and "smart quotes" are the bane of my existence. It's only recently that I've figured out what exactly is going on, and I figure I'd ask around for ideas on how to regulate our data uploads to make sure everything goes smoothly.

    By default, Miva seems to operate on the UTF-8 character encoding standard. HTTP Content-types say "text/html; charset=utf-8" unless I use the HTTP Headers module to say otherwise, and the entirety of our MySQL database uses utf8_general_ci collation. However, it seems our data is being uploaded in either a part of ISO-8859, or Windows-1252 encoding - setting a browser to autodetect will show the non-ascii characters as black diamonds with a question mark in them, but if we force ISO-8859-1 or Windows-1252 on our browsers, the text displays correctly.

    Originally I had suggested that we entitize these characters (e.g. "é" -> "é") prior to uploading, though lately this has fallen through on account of our copywriters not bothering to do it, and has caused problems with data feeds for third-party services. Plus, Miva seems to have been designed to be able to take in these special characters (going by the fact that you can render data with an "&mvte;" entity prefix to have the data entitized for output).

    Just wondering if anyone has had similar problems to this and has solutions to suggest that could be integrated into our production flow as painlessly as possible. I have ideas, but I have a tendency to take convoluted ways out instead of just cutting a path with Occam's Razor.
    Last edited by chrisb; 07-10-12, 07:29 AM.

    #2
    Re: Character Encoding and Non-ASCII Characters - What to Do?

    So in my time playing with all this and trying to figure things out, I noticed something odd. As I said, Miva seems to operate on UTF-8 encoding. My store's MySQL database tables are all collated as utf8_general_ci, yet character data is stored into and retrieved as if it were ISO-8859-1/Latin-1 or Windows-1252. For example, if I type "décor" into a product's description field and save the product, in phpMyAdmin and from the command line, I see either "décor" (phpMyAdmin) or "d├⌐cor" (command line). If I paste "décor" into a new Notepad++ document in ANSI mode and then tell it to view as UTF-8, it looks fine.

    From a development standpoint, having the database claiming to be UTF-8 while actually holding Latin-1/Windows-1252 character data is really confusing.

    As for a solution to my initial problem, it seems it's a matter of making sure all new product uploads are saved in UTF-8 encoding with no byte order mark (BOM).
    Last edited by chrisb; 07-16-12, 12:59 PM.

    Comment


      #3
      Re: Character Encoding and Non-ASCII Characters - What to Do?

      On the settings page is a prompt for Character Set:

      Also in the head tag content area try adding one of these.

      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
      <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
      Ray Yates
      "If I have seen further, it is by standing on the shoulders of giants."
      --- Sir Isaac Newton

      Comment


        #4
        Re: Character Encoding and Non-ASCII Characters - What to Do?

        Thanks, I overlooked the store's encoding setting; didn't notice it until now.

        I've decided to stick with the UTF-8 character set and I've scrubbed our data to conform to this, and our product manager knows now to make sure our uploaded flat files are saved in UTF-8 with no BOM.

        In retrospect, I wish there was a warning about encoding, or a dropdown to choose the flat file's encoding, in the upload module when you go to upload something.

        Comment


          #5
          Re: Character Encoding and Non-ASCII Characters - What to Do?

          Originally posted by chrisb View Post
          ...In retrospect, I wish there was a warning about encoding, or a dropdown to choose the flat file's encoding, in the upload module when you go to upload something.
          Agreed. Its a real pain when converting for mm4 to mm5 when the customer has encoded lots of descriptions this way.
          Ray Yates
          "If I have seen further, it is by standing on the shoulders of giants."
          --- Sir Isaac Newton

          Comment


            #6
            Re: Character Encoding and Non-ASCII Characters - What to Do?

            I want to tag along on this thread because I've got a store owner that also uses ANSI characters. I would agree the best practice would be to convert them. With that said - what is best to convert them to? HTML? What will be the best thing to help while they are converting them?

            Use the previously suggested head tags?

            Leslie
            Leslie Kirk
            Miva Certified Developer
            Miva Merchant Specialist since 1997
            Previously of Webs Your Way
            (aka Leslie Nord leslienord)

            Email me: [email protected]
            www.lesliekirk.com

            Follow me: Twitter | Facebook | FourSquare | Pinterest | Flickr

            Comment

            Working...
            X