Rich Text Editor Nightmare - Options? - Miva Merchant Community Forums

lesliekirk replied

09-28-20, 03:55 AM
Originally posted by William Davis View Post

The best cleaner that worked for us due to extensive MS Word formatting code was one of the simpler online solutions we came across called HtmlWasher. It worked 100% of the times.

They also have an upload option which I am testing.

I'll have to pass this one to a couple of store owners who still have a LOT of old copy & pasted "code" from Expressions.
Leave a comment:
William Davis replied

09-27-20, 06:57 AM
The best cleaner that worked for us due to extensive MS Word formatting code was one of the simpler online solutions we came across called HtmlWasher. It worked 100% of the times.

They also have an upload option which I am testing.

Update:

Best Online MS Word to HMTL Decoder / Encoder Tool:

Professional Online HTML Editor | htmlg.com - Word Decoder / Encoder

https://htmlg.com/html-editor/

Last edited by William Davis; 12-24-21, 09:55 AM.
Leave a comment:
Bruce - PhosphorMedia replied

09-25-20, 11:26 AM
I know there are some tools out there that will strip ALL html, but that would also mean no <p> tags. There might be one out there which allows you to preclude removal of certain one. Mivascript "stripHtml()" function does this, and a "Tool Page" could be created, but the actual steps here are beyond the scope of a forum post.
Leave a comment:
William Davis replied

09-25-20, 10:57 AM
Well, we first learned their was an issue when trying to import data to Excel, but had no idea what was causing it for a very long time, no one did, not even Miva.

How did we identify which products where contaminated with the Word formatting codes? We identified a common Word formatting code contained in the description field <xml>.

Since Word formatting codes was rendering Miva export files useless, we the used old Man Weiland's "Find and Replace" to locate every product that contained <xml> Word formatting code in the product description field. We then simply copied and pasted the entire search results to Excel and kept the product code column only, then added a second column and concatenated "https://www.site.com/[ProdCode].html to create a list product-corrupt-desc.csv file.

We then use this list of corrupted product descriptions to scrape affected product pages using a data scrapping app for Chrome called Data Miner (amazing) to build us a new file (e.g.: Product Code, Product Title, Product Description with w/o HTML Codes and separate file with HTML codes. With the exception of looking for a great tool to scrape affected pages on our site frontend, everything else came together fast an easy, up to here.

The problem then became finding another batch tool that would surgically remove unwanted Word formatting codes while retaining others so I can import them back into Miva. That is where I am currently stuck. Found some hit and miss tools online, others work great, but no batch solution.

Suggestions anyone?
Leave a comment:
lesliekirk replied

05-06-19, 05:41 AM
Originally posted by Bruce - PhosphorMedia View Post

RE to Miva Team: Issues with product descriptions

This is s pretty common occurrence. Any chance that at least the Product Update process could detect some of these issues? (i.e., character encoding/high order ascii characters, embedded pages, AKA <html><head></head> etc...

What to really do it justice, pop a warning when those tags are attempted to be added.
Leave a comment:
Bruce - PhosphorMedia replied

05-03-19, 09:56 AM
RE to Miva Team: Issues with product descriptions

This is s pretty common occurrence. Any chance that at least the Product Update process could detect some of these issues? (i.e., character encoding/high order ascii characters, embedded pages, AKA <html><head></head> etc...
Leave a comment:
Leanne replied

05-02-19, 09:17 PM
Thank you, for future reference for anyone, that worked. Here is the full code I placed on the PLST page.

Code:

<mvt:foreach iterator="product" array="all_products:products"> <mvt:assign name="g.thisDescription" value="glosub(l.settings:product:descrip, asciichar(10), '')" /> <mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(11), '')" /> <mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(12), '')" /> <mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(13), '')" /> <mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, '\r', '')" /> <mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, '\n', '')" /> <mvt:assign name="g.thisDescription" value="g.thisDescription $ asciichar(10) $ asciichar(13)"/> &mvte:product:code;|&mvte:product:name;|<mvt:assign name="g.allowed_tags" value="'p,li,ul,b,a,br'" /><mvt:eval expr="miva_html_strip( g.thisDescription, g.allowed_tags )" /> </mvt:foreach>

Unfortunately, getting the resulting code neatly into Excel was less successful. At some point, thanks to the encoding issues, the content started nesting (it does that on the display of the PLST page as well, although the source code is a nice series of single lines.) In many cases it was only descriptions that combined, but in one case I had 28 products (code, name, description) appended to another product's description. It was not a quick and easy copy and paste to delete the encoding, although I was able to do a certain amount of that. But there were enough variations that it wasn't possible to completely remove the code cleanly. It took me pretty much the entire day to clean up the file and prep it for re-importing, but I am happy to say that it is done, and done successfully!

Thank you so much to everyone who chimed in. For those who requested a "how-to" I'm reluctant to post one because it really is a hands-on project, but if you can get your data at least mostly cleanly into Excel with the use of the PLST page and Bruce's generous code provision, using the search function and "mso" as the search term is helpful, as is spell check for finding places where words have been run together because line breaks from the text editor have been removed.

Also, a note: This issue was discovered because the store owner hired a new ad agency and they remarked that multiple product descriptions were missing from the Google shopping feed. It was in troubleshooting the feed that I discovered what was going on. So if you're using the Rich Text Editor, and also have feeds set up, double check your feeds to be sure you aren't having issues with descriptions.
Leave a comment:
Bruce - PhosphorMedia replied

05-02-19, 11:50 AM
Oh, sorry, what I should have added is something like:

<mvt:assign name="g.thisDescription" value="g.thisDescription $ asciichar(10)$ asciichar(13)"/>

as the last g.thisDescription.
Leave a comment:
Leanne replied

05-02-19, 11:47 AM
Thanks, That works, but also removes line breaks at the end of the description so all of the products ran rogether.

But, I wound up downloading Notepad++ and using the instructions from here to remove the line breaks, so I now have a functioning excel file to work with. Progress!!!

https://stackoverflow.com/questions/...-text/13990281
Leave a comment:
Bruce - PhosphorMedia replied

05-02-19, 11:12 AM
The way i've solved line breaks in the past is pasting the offending column into a text editor and changing the character Encoding until they disappear (not very eloquent, but it works when you don't know what the line break encoding is.)

You could try using:

<mvt:assign name="g.thisDescription" value="glosub(l.settings:product:descrip, asciichar(10), '')" />
<mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(11), '')" />
<mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(12), '')" />
<mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, asciichar(13), '')" />

<mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, '\r', '')" />
<mvt:assign name="g.thisDescription" value="glosub(g.thisDescription, '\n', '')" />

&mvt:global:thisDescription;

<mvt:assign name="g.thisDescription" value="''" />
Leave a comment:
Leanne replied

05-02-19, 09:20 AM
Originally posted by lesliekirk View Post

Which suggestion worked for you? https://html-cleaner.com/ or

It was the PLST page idea. There is still MS Office encoding coming through, but it's significantly cleaned up, and seems to follow a pattern that I could search and replace for if I could get the content nicely into a spreadsheet, but the line breaks are messing me up.
Leave a comment:
William Davis replied

05-02-19, 08:34 AM
Once you find a solution, please post same here, I have a similar problem with over 10K products.
Leave a comment:
ids replied

05-02-19, 08:31 AM
I think I've run into this before. And ideally, I think the RTE wants UTF8 encoding.

So the export is breaking because of the line feeds? If that isn't it, I don't understand how the exported file is broken.

Assuming my "assumption" the copying from Word seems to be the problem. Isn't there a way in word save the file into another format, that also saves the content with UTF8 encoding? This is how I think the formatting will be preserved and entities like linefeeds won't be interpreted literally.

It's possible that the content from Word simply needs to be read into a different app like a spreadsheet that could interpret the formatting and encode it correctly in that app.

I can't be exact for the steps to take, I'm not a Word guru. But this seems like it will be worth a try before trying to code/script a band-aid on a gash that really needs stitches.

Scott
Leave a comment:
lesliekirk replied

05-02-19, 05:31 AM
Originally posted by Kent Multer View Post

With 2300 descriptions to fix, even at just one minute per product, that's almost 40 hours of someone's time. It might be worthwhile to have a custom script written that can do them all with a few clicks. Depending on the specifics of what you need removed or replaced, it could be a huge time saver. If you're interested, you can drop me a line by email, and send me a sample of the text.

Thanks --

I'll drop you a line. My problem is there is no consistency to the usage of HTML in the product description. Some descriptions have a complete HTML page pasted in the field. Others start with just <style> ... </style> that was created by MS Expressions. Some of the HTML is as old as FrontPage. Then there is a ton of inline styling. I'll send you samples. - Leslie
Leave a comment:
lesliekirk replied

05-02-19, 03:46 AM
Originally posted by Leanne View Post

So Bruce's solution is very close to brilliant and allows me to keep the basic html*, but I've been beating my head against the wall trying to remove the line breaks from the descriptions so that with no success.

Which suggestion worked for you? https://html-cleaner.com/ or

Create or use the existing Product List page.

In the product loop for that page, output product_code, product_name, and then

Code:
<mvt:assign name="g.allowed_tags" value="''" /> <mvt:eval expr="miva_html_strip( l.settings:product:descrip, g.allowed_tags )" />
Leave a comment:

Announcement

Rich Text Editor Nightmare - Options?

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: