Extended ASCII, special characters, non-ASCII characters, whatever you want to call them, they're proving to be a pain with our store. Our store deals in domestics and bedding, and as such we [ab]use the word "décor" a lot.
Note the e accent aigu there. As far as character encoding and data integrity goes, that and "smart quotes" are the bane of my existence. It's only recently that I've figured out what exactly is going on, and I figure I'd ask around for ideas on how to regulate our data uploads to make sure everything goes smoothly.
By default, Miva seems to operate on the UTF-8 character encoding standard. HTTP Content-types say "text/html; charset=utf-8" unless I use the HTTP Headers module to say otherwise, and the entirety of our MySQL database uses utf8_general_ci collation. However, it seems our data is being uploaded in either a part of ISO-8859, or Windows-1252 encoding - setting a browser to autodetect will show the non-ascii characters as black diamonds with a question mark in them, but if we force ISO-8859-1 or Windows-1252 on our browsers, the text displays correctly.
Originally I had suggested that we entitize these characters (e.g. "é" -> "é") prior to uploading, though lately this has fallen through on account of our copywriters not bothering to do it, and has caused problems with data feeds for third-party services. Plus, Miva seems to have been designed to be able to take in these special characters (going by the fact that you can render data with an "&mvte;" entity prefix to have the data entitized for output).
Just wondering if anyone has had similar problems to this and has solutions to suggest that could be integrated into our production flow as painlessly as possible. I have ideas, but I have a tendency to take convoluted ways out instead of just cutting a path with Occam's Razor.
Note the e accent aigu there. As far as character encoding and data integrity goes, that and "smart quotes" are the bane of my existence. It's only recently that I've figured out what exactly is going on, and I figure I'd ask around for ideas on how to regulate our data uploads to make sure everything goes smoothly.
By default, Miva seems to operate on the UTF-8 character encoding standard. HTTP Content-types say "text/html; charset=utf-8" unless I use the HTTP Headers module to say otherwise, and the entirety of our MySQL database uses utf8_general_ci collation. However, it seems our data is being uploaded in either a part of ISO-8859, or Windows-1252 encoding - setting a browser to autodetect will show the non-ascii characters as black diamonds with a question mark in them, but if we force ISO-8859-1 or Windows-1252 on our browsers, the text displays correctly.
Originally I had suggested that we entitize these characters (e.g. "é" -> "é") prior to uploading, though lately this has fallen through on account of our copywriters not bothering to do it, and has caused problems with data feeds for third-party services. Plus, Miva seems to have been designed to be able to take in these special characters (going by the fact that you can render data with an "&mvte;" entity prefix to have the data entitized for output).
Just wondering if anyone has had similar problems to this and has solutions to suggest that could be integrated into our production flow as painlessly as possible. I have ideas, but I have a tendency to take convoluted ways out instead of just cutting a path with Occam's Razor.
Comment