HTML entity decoding solution for new and existing data will detects, find errors (especially U+FFFD REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character) with option to correct. This is a very big problem with multilingual sites or site that use Latin characters because there is no way (that I know of) to search for the symbol below.
Example:
330px-Replacement_character.svg.png
Example:
330px-Replacement_character.svg.png