I am working with html document generated from Micrsoft Word 2007/2010. Besides generating incredibly dirty html, word also has the tendency of using both block and inline style. I am looking for a php library would merge block into already existing inline style element.
Edit The goal is to construct a html block preserve the original formatting and editable in WYSIWYG editor like tinyMCE
Example
If the original html is:
<html>
<head>
<style>
.normaltext {color:black;font-weight:normal;font-size:10pt}
.important {color:red;font-weight:bold;font-size:11pt}
</style>
<body>
<p class="normaltext" style="font-family:arial">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span class="important">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span class="important" style="color:green">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
Should become:
<html>
<head>
<body>
<p style="font-family:arial;color:black;font-weight:normal;font-size:10pt">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In ut erat id dui mollis faucibus. Mauris eu neque et eros tempus placerat.
<span style="color:red;font-weight:bold;font-size:11pt">Nam in purus nisi</span>, vitae dictum ligula.
Morbi mattis eros eget diam vulputate imperdiet.
<span style="color:green;font-weight:bold;font-size:11pt">Integer</span> a metus eros.
Sed iaculis porta imperdiet.
</p>
</body>
</html>
I finally managed to get it to work. The code is based off of http://blog.verkoyen.eu/blog/p/detail/convert-css-to-inline-styles-with-php with once simple change: Moving the line
// add new properties into the list
foreach($rule['properties'] as $key => $value) $properties[$key] = $value;
up to the begining of the loop, right after where $properties is declared.
To make this work for WordPress however, one additional change is needed. DomDocument replace &nbps; from the document with blanks, which breaks WordPress update statement and lead to cotent being cut off. Please refer to my other question for the solution: DOMDocument->saveHTML() converting to space
This problem is detailed in https://wordpress.stackexchange.com/questions/48692/post-content-getting-cut-off-on-blank-space-on-wpdb-update. If you know why this is happening for WordPress, please post your answer there as I would very much like to find out why it is happening.
No, but try this instead, copying and pasting from word into http://ckeditor.com/ or tinymce, etc does clean it up A LOT, thought it's still not perfect it will get you much closer.
Check out:
Porting code from either of the sources to PHP, or using any of the available APIs should do the trick of getting your CSS styling inline.
See the CssToInlineStyles project which does exactly what you want.