If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
So Bruce's solution is very close to brilliant and allows me to keep the basic html*, but I've been beating my head against the wall trying to remove the line breaks from the descriptions so that with no success. I have searched all over trying to find a solution and nothing I have tried is working. I can't import the results into Excel with the existing line breaks. I'm sure it's something simple but I've been fighting with this site all day and my brain is dead. Any help is appreciated.
Kent, fortunately only later-added products are affected, and that 2300 number includes product variants and other inactive products so it's not quite at that scope, but you make an excellent point. I will definitely be in touch if needed.
*But I am now discovering that some of the crap code is making its way through. Urgh.
With 2300 descriptions to fix, even at just one minute per product, that's almost 40 hours of someone's time. It might be worthwhile to have a custom script written that can do them all with a few clicks. Depending on the specifics of what you need removed or replaced, it could be a huge time saver. If you're interested, you can drop me a line by email, and send me a sample of the text.
I was hoping you'd chime in, you always have good ideas. I may try that. Unfortunately, all of the product descriptions have <p> <b> and <ul> and<li> tags in them that will have to be put back in so I'm not sure how much time it would save.
Well, it should get rid of quite a bit of code bloat...you can add p, ul, li, etc to the allowed tags, then perhaps create your own 'parser' (to go with what Leslie was suggesting) to remove all 'style' attributes.
It might be tricky...but it's doable. Or you could try running the results through something like this:
Can the method be used to strip "stuff" out from inside a tag? MS Expressions attempts to build using classes inside things like the <p> <ul> and <li> tags.
I was hoping you'd chime in, you always have good ideas. I may try that. Unfortunately, all of the product descriptions have <p> <b> and <ul> and<li> tags in them that will have to be put back in so I'm not sure how much time it would save.
You should be able to use that output as product import file. You might use the "allowed" tags to preserve <p> or <h1>, but seeing how its MS:Office HTML, probably wont help much.
I am working with a site that needs to be able to modify product descriptions via an export file. Unfortunately, they have copied formatted text from Word and pasted it into the rich text editor, and now the source code of the descriptions is full of MS Office encoding and it is breaking the export file in a big way.
Are there any options for fixing this short of going in and recoding the messed up descriptions? The site has 2,300 products, and the export file is 19,000 lines long.
OMG - or for a site that uses Microsoft Expressions, then copy & pastes from <html> to </html> into the Product Description field.
I am working with a site that needs to be able to modify product descriptions via an export file. Unfortunately, they have copied formatted text from Word and pasted it into the rich text editor, and now the source code of the descriptions is full of MS Office encoding and it is breaking the export file in a big way.
Are there any options for fixing this short of going in and recoding the messed up descriptions? The site has 2,300 products, and the export file is 19,000 lines long.
Leave a comment: