How to Clean Up Microsoft Word HTML Special Characters with PHP

Home / Coding / How to Clean Up Microsoft Word HTML Special Characters with PHP

If you’re like everyone else in programming you’ve learned to HATE Microsoft’s implementation of HTML and how badly they’ve screwed up the web. Here is a quick function to use to clear out a few common special characters and replace them with standard ones. This is quick and dirty, but it works great. If you have extra resources you know of or more characters that I should include, post a comment down below.

Go from this:

There’s a “Problem” with Microsoft Word… it posts a – bunch of crap into the text.

To this:

There’s a “Problem” with Microsoft Word… it posts a – bunch of crap into the text.

 

function SanitizeFromWord($src = '') {
$src = str_replace("‘", "'", $src);
 $src = str_replace("’", "'", $src);
 $src = str_replace("”", '"', $src);
 $src = str_replace("“", '"', $src);
 $src = str_replace("–", "-", $src);
 $src = str_replace("—","-",$src);
 $src = str_replace("…", "...", $src);
return $src;
}

3 Comments

Comments are closed.

%d bloggers like this: