文字正常显示但没有提取出来图片 也没显示图片
以下是代码
import org.apache.poi.xwpf.converter.core.BasicURIResolver;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import java.io.*;
import java.util.List;
public class DocxTransUtil {
public static void trans2htm(String docxfile)throws Throwable{
long startTime = System.currentTimeMillis();
String fileName = docxfile.substring(docxfile.lastIndexOf("\\")+1);
String file = fileName.substring(0,fileName.lastIndexOf("."));
String path = DocxTransUtil.class.getResource("/").getPath()+"News/";
XWPFDocument document = new XWPFDocument(new FileInputStream(docxfile));
XHTMLOptions options = XHTMLOptions.create().indent(4);
// 导出图片
File imageFolder = new File(path);
options.setExtractor(new FileImageExtractor(imageFolder));
// URI resolver word的html中图片的目录路径
options.URIResolver(new FileURIResolver(imageFolder));
File outFile = new File(path+file+".html");
outFile.getParentFile().mkdirs();
OutputStream out = new FileOutputStream(outFile);
XHTMLConverter.getInstance().convert(document, out, options);
System.out.println(path+"Generate " + path + " with " + (System.currentTimeMillis() - startTime) + " ms.");
}
}
楼上说的“深蓝”是啥?是冰蓝?他们应该就是spire.doc.jar,专门操作word的,word转html代码如下:
import com.spire.doc.*;
public class WordtoHtml {
public static void main(String[] args) {
Document doc = new Document();
doc.loadFromFile("样本.docx");
doc.saveToFile("wordtohtml.html",FileFormat.Html);
doc.dispose();
}
}
以上环境使用版本为free spire.doc.jar 3.9.0即免费版
自己玩的话,就别用poi了! 如果只处理docx,可以用docx4j,国内有个叫深蓝的,他有office的工具包,国外的话有aspose!还原度都还行