格式化的Pdf字到html

I need to convert formatted pdf and word document to html. This conversion is for show the document into web browsers. Into web browser you can also select text. I don't know if it is better to do at backend side (with Java for example) or with maybe php, or there is a jquery/javascript plugin?

My target is to show these documents in a web browser like iPaper.

Thanks for the help

You can use pdftohtml and run it server-side automatically, or batch process your pdfs with it.

Here's a PowerShell solution I'm working on refining:

https://github.com/suzumakes/ReplaceIT

If the problem you're having is that Word spits out ridiculous amounts of garbage and claims it's an HMTL file, this should help a lot. There's a reason that iPaper has such a large team though, and that's because you're looking to build a webpage out of a document, print to web with the click of a button, and it turns out that's pretty difficult.