How to Clean Up Microsoft Word HTML Special Characters with PHP

If you’re like everyone else in programming you’ve learned to HATE Microsoft’s implementation of HTML and how badly they’ve screwed up the web. Here is a quick function to use to clear out a few common special characters and replace them with standard ones. This is quick and dirty, but it works great. If you have extra resources you know of or more characters that I should include, post a comment down below.

Go from this:

There’s a “Problem” with Microsoft Word… it posts a – bunch of crap into the text.

To this:

There’s a “Problem” with Microsoft Word… it posts a – bunch of crap into the text.

 

function SanitizeFromWord($src = '') {
$src = str_replace("‘", "'", $src);
 $src = str_replace("’", "'", $src);
 $src = str_replace("”", '"', $src);
 $src = str_replace("“", '"', $src);
 $src = str_replace("–", "-", $src);
 $src = str_replace("—","-",$src);
 $src = str_replace("…", "...", $src);
return $src;
}

3 thoughts on “How to Clean Up Microsoft Word HTML Special Characters with PHP”

  1. still no go, i copy and past the above and they both look like regular dashes. both dashes are the same length. how did you get the long one?

  2. function SanitizeFromWord($src = ”) {
    $f = array(“‘”,”’”,”””,”“”,”–”,”—”,”…”);
    $t = array(“‘”,”‘”,'”‘,'”‘,”-“,”-“,”…”);
    return str_replace($f, $t, $src);
    }

    I always prefer shorter code..

Comments are closed.