用poi将docx转html但无法提取出图片

文字正常显示但没有提取出来图片 也没显示图片

以下是代码

import org.apache.poi.xwpf.converter.core.BasicURIResolver;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

import java.io.*;
import java.util.List;

public class DocxTransUtil {

    public static void trans2htm(String docxfile)throws Throwable{
        long startTime = System.currentTimeMillis();
        String fileName = docxfile.substring(docxfile.lastIndexOf("\\")+1);
        String file = fileName.substring(0,fileName.lastIndexOf("."));
        String path = DocxTransUtil.class.getResource("/").getPath()+"News/";
        XWPFDocument document = new XWPFDocument(new FileInputStream(docxfile));
        XHTMLOptions options = XHTMLOptions.create().indent(4);
        // 导出图片
        File imageFolder = new File(path);
        options.setExtractor(new FileImageExtractor(imageFolder));
        // URI resolver  word的html中图片的目录路径
        options.URIResolver(new FileURIResolver(imageFolder));
        File outFile = new File(path+file+".html");
        outFile.getParentFile().mkdirs();
        OutputStream out = new FileOutputStream(outFile);
        XHTMLConverter.getInstance().convert(document, out, options);
        System.out.println(path+"Generate " + path + " with " + (System.currentTimeMillis() - startTime) + " ms.");

    }

    }

 

楼上说的“深蓝”是啥?是冰蓝?他们应该就是spire.doc.jar,专门操作word的,word转html代码如下:

import com.spire.doc.*;

public class WordtoHtml {
    public static void main(String[] args) {
        Document  doc = new Document();
        doc.loadFromFile("样本.docx");
        doc.saveToFile("wordtohtml.html",FileFormat.Html);
        doc.dispose();
    }
}

以上环境使用版本为free spire.doc.jar 3.9.0即免费版

自己玩的话,就别用poi了! 如果只处理docx,可以用docx4j,国内有个叫深蓝的,他有office的工具包,国外的话有aspose!还原度都还行