java使用POI實現html和word相互轉換

發布時間：2020-08-25 22:46:20 來源：腳本之家閱讀：3860 作者：追逐盛夏流年欄目：編程語言

項目后端使用了springboot，maven，前端使用了ckeditor富文本編輯器。目前從html轉換的word為doc格式，而圖片處理支持的是docx格式，所以需要手動把doc另存為docx，然后才可以進行圖片替換。

一.添加maven依賴

主要使用了以下和poi相關的依賴，為了便于獲取html的圖片元素，還使用了jsoup：

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi</artifactId>
  <version>3.14</version>
</dependency>

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-scratchpad</artifactId>
  <version>3.14</version>
</dependency>

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-ooxml</artifactId>
  <version>3.14</version>
</dependency>

<dependency>
  <groupId>fr.opensagres.xdocreport</groupId>
  <artifactId>xdocreport</artifactId>
  <version>1.0.6</version>
</dependency>

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-ooxml-schemas</artifactId>
  <version>3.14</version>
</dependency>

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>ooxml-schemas</artifactId>
  <version>1.3</version>
</dependency>

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.11.3</version>
</dependency>

二.word轉換為html

在springboot項目的resources目錄下新建static文件夾，將需要轉換的word文件temp.docx粘貼進去，由于static是springboot的默認資源文件，所以不需要在配置文件里面另行配置了，如果改成其他名字，需要在application.yml進行相應配置。

doc格式轉換為html：

public static String docToHtml() throws Exception {
  File path = new File(ResourceUtils.getURL("classpath:").getPath());
  String imagePathStr = path.getAbsolutePath() + "\\static\\image\\";
  String sourceFileName = path.getAbsolutePath() + "\\static\\test.doc";
  String targetFileName = path.getAbsolutePath() + "\\static\\test2.html";
  File file = new File(imagePathStr);
  if(!file.exists()) {
    file.mkdirs();
  }
  HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(sourceFileName));
  org.w3c.dom.Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document);
  //保存圖片，并返回圖片的相對路徑
  wordToHtmlConverter.setPicturesManager((content, pictureType, name, width, height) -> {
    try (FileOutputStream out = new FileOutputStream(imagePathStr + name)) {
      out.write(content);
    } catch (Exception e) {
      e.printStackTrace();
    }
    return "image/" + name;
  });
  wordToHtmlConverter.processDocument(wordDocument);
  org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
  DOMSource domSource = new DOMSource(htmlDocument);
  StreamResult streamResult = new StreamResult(new File(targetFileName));
  TransformerFactory tf = TransformerFactory.newInstance();
  Transformer serializer = tf.newTransformer();
  serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
  serializer.setOutputProperty(OutputKeys.INDENT, "yes");
  serializer.setOutputProperty(OutputKeys.METHOD, "html");
  serializer.transform(domSource, streamResult);
  return targetFileName;
}

docx格式轉換為html

public static String docxToHtml() throws Exception {
  File path = new File(ResourceUtils.getURL("classpath:").getPath());
  String imagePath = path.getAbsolutePath() + "\\static\\image";
  String sourceFileName = path.getAbsolutePath() + "\\static\\test.docx";
  String targetFileName = path.getAbsolutePath() + "\\static\\test.html";

  OutputStreamWriter outputStreamWriter = null;
  try {
    XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName));
    XHTMLOptions options = XHTMLOptions.create();
    // 存放圖片的文件夾
    options.setExtractor(new FileImageExtractor(new File(imagePath)));
    // html中圖片的路徑
    options.URIResolver(new BasicURIResolver("image"));
    outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8");
    XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();
    xhtmlConverter.convert(document, outputStreamWriter, options);
  } finally {
    if (outputStreamWriter != null) {
      outputStreamWriter.close();
    }
  }
  return targetFileName;
}

轉換成功后會生成對應的html文件，如果想在前端展示，直接讀取文件轉換為String返回給前端即可。

public static String readfile(String filePath) {
  File file = new File(filePath);
  InputStream input = null;
  try {
    input = new FileInputStream(file);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  }
  StringBuffer buffer = new StringBuffer();
  byte[] bytes = new byte[1024];
  try {
    for (int n; (n = input.read(bytes)) != -1;) {
      buffer.append(new String(bytes, 0, n, "utf8"));
    }
  } catch (IOException e) {
    e.printStackTrace();
  }
  return buffer.toString();
}

在富文本編輯器ckeditor中的顯示效果：

java使用POI實現html和word相互轉換

三.html轉換為word

實現思路就是先把html中的所有圖片元素提取出來，統一替換為變量字符”${imgReplace}“，如果多張圖片，可以依序排列下去，之后生成對應的doc文件（之前試過直接生成docx文件發現打不開，這個問題尚未找到好的解決方法），我們將其另存為docx文件，之后就可以替換變量為圖片了：

public static String writeWordFile(String content) {
    String path = "D:/wordFile";
    Map<String, Object> param = new HashMap<String, Object>();

    if (!"".equals(path)) {
      File fileDir = new File(path);
      if (!fileDir.exists()) {
        fileDir.mkdirs();
      }
      content = HtmlUtils.htmlUnescape(content);
      List<HashMap<String, String>> imgs = getImgStr(content);
      int count = 0;
      for (HashMap<String, String> img : imgs) {
        count++;
        //處理替換以“/>”結尾的img標簽
        content = content.replace(img.get("img"), "${imgReplace" + count + "}");
        //處理替換以“>”結尾的img標簽
        content = content.replace(img.get("img1"), "${imgReplace" + count + "}");
        Map<String, Object> header = new HashMap<String, Object>();

        try {
          File filePath = new File(ResourceUtils.getURL("classpath:").getPath());
          String imagePath = filePath.getAbsolutePath() + "\\static\\";
          imagePath += img.get("src").replaceAll("/", "\\\\");
          //如果沒有寬高屬性，默認設置為400*300
          if(img.get("width") == null || img.get("height") == null) {
            header.put("width", 400);
            header.put("height", 300);
          }else {
            header.put("width", (int) (Double.parseDouble(img.get("width"))));
            header.put("height", (int) (Double.parseDouble(img.get("height"))));
          }
          header.put("type", "jpg");
          header.put("content", OfficeUtil.inputStream2ByteArray(new FileInputStream(imagePath), true));
        } catch (FileNotFoundException e) {
          e.printStackTrace();
        }
        param.put("${imgReplace" + count + "}", header);
      }
      try {
        // 生成doc格式的word文檔，需要手動改為docx
        byte by[] = content.getBytes("UTF-8");
        ByteArrayInputStream bais = new ByteArrayInputStream(by);
        POIFSFileSystem poifs = new POIFSFileSystem();
        DirectoryEntry directory = poifs.getRoot();
        DocumentEntry documentEntry = directory.createDocument("WordDocument", bais);
        FileOutputStream ostream = new FileOutputStream("D:\\wordFile\\temp.doc");
        poifs.writeFilesystem(ostream);
        bais.close();
        ostream.close();

        // 臨時文件（手動改好的docx文件）
        CustomXWPFDocument doc = OfficeUtil.generateWord(param, "D:\\wordFile\\temp.docx");
        //最終生成的帶圖片的word文件
        FileOutputStream fopts = new FileOutputStream("D:\\wordFile\\final.docx");
        doc.write(fopts);
        fopts.close();
      } catch (Exception e) {
        e.printStackTrace();
      }

    }
    return "D:/wordFile/final.docx";
  }

  //獲取html中的圖片元素信息
  public static List<HashMap<String, String>> getImgStr(String htmlStr) {
    List<HashMap<String, String>> pics = new ArrayList<HashMap<String, String>>();

    Document doc = Jsoup.parse(htmlStr);
    Elements imgs = doc.select("img");
    for (Element img : imgs) {
      HashMap<String, String> map = new HashMap<String, String>();
      if(!"".equals(img.attr("width"))) {
        map.put("width", img.attr("width").substring(0, img.attr("width").length() - 2));
      }
      if(!"".equals(img.attr("height"))) {
        map.put("height", img.attr("height").substring(0, img.attr("height").length() - 2));
      }
      map.put("img", img.toString().substring(0, img.toString().length() - 1) + "/>");
      map.put("img1", img.toString());
      map.put("src", img.attr("src"));
      pics.add(map);
    }
    return pics;
  }

OfficeUtil工具類，之前發現網上的寫法只支持一張圖片的修改，多張圖片就會報錯，是因為添加了圖片，processParagraphs方法中的runs的大小改變了，會報ArrayList的異常，就和我們循環list中刪除元素會報異常道理一樣，解決方法就是復制一個新的Arraylist進行循環即可：

package com.example.demo.util; 

import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.poi.POIXMLDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow; 

/** 
 * 適用于word 2007
 */ 
public class OfficeUtil { 

  /** 
   * 根據指定的參數值、模板，生成 word 文檔 
   * @param param 需要替換的變量 
   * @param template 模板 
   */ 
  public static CustomXWPFDocument generateWord(Map<String, Object> param, String template) { 
    CustomXWPFDocument doc = null;
    try { 
      OPCPackage pack = POIXMLDocument.openPackage(template); 
      doc = new CustomXWPFDocument(pack); 
      if (param != null && param.size() > 0) { 

        //處理段落 
        List<XWPFParagraph> paragraphList = doc.getParagraphs(); 
        processParagraphs(paragraphList, param, doc); 

        //處理表格 
        Iterator<XWPFTable> it = doc.getTablesIterator(); 
        while (it.hasNext()) {
          XWPFTable table = it.next(); 
          List<XWPFTableRow> rows = table.getRows(); 
          for (XWPFTableRow row : rows) { 
            List<XWPFTableCell> cells = row.getTableCells(); 
            for (XWPFTableCell cell : cells) { 
              List<XWPFParagraph> paragraphListTable = cell.getParagraphs(); 
              processParagraphs(paragraphListTable, param, doc); 
            } 
          } 
        } 
      } 
    } catch (Exception e) { 
      e.printStackTrace(); 
    } 
    return doc; 
  } 
  /** 
   * 處理段落 
   * @param paragraphList 
   */ 
  public static void processParagraphs(List<XWPFParagraph> paragraphList,Map<String, Object> param,CustomXWPFDocument doc){ 
    if(paragraphList != null && paragraphList.size() > 0){ 
      for(XWPFParagraph paragraph:paragraphList){
        //poi轉換過來的行間距過大，需要手動調整
        if(paragraph.getSpacingBefore() >= 1000 || paragraph.getSpacingAfter() > 1000) {
          paragraph.setSpacingBefore(0);
          paragraph.setSpacingAfter(0);
        }
        //設置word中左右間距
        paragraph.setIndentationLeft(0);
        paragraph.setIndentationRight(0);
        List<XWPFRun> runs = paragraph.getRuns();
        //加了圖片，修改了paragraph的runs的size，所以循環不能使用runs
        List<XWPFRun> allRuns = new ArrayList<XWPFRun>(runs);
        for (XWPFRun run : allRuns) {
          String text = run.getText(0); 
          if(text != null){
            boolean isSetText = false; 
            for (Entry<String, Object> entry : param.entrySet()) { 
              String key = entry.getKey(); 
              if(text.indexOf(key) != -1){ 
                isSetText = true; 
                Object value = entry.getValue(); 
                if (value instanceof String) {//文本替換 
                  text = text.replace(key, value.toString()); 
                } else if (value instanceof Map) {//圖片替換 
                  text = text.replace(key, ""); 
                  Map pic = (Map)value; 
                  int width = Integer.parseInt(pic.get("width").toString()); 
                  int height = Integer.parseInt(pic.get("height").toString()); 
                  int picType = getPictureType(pic.get("type").toString()); 
                  byte[] byteArray = (byte[]) pic.get("content"); 
                  ByteArrayInputStream byteInputStream = new ByteArrayInputStream(byteArray); 
                  try { 
                    String blipId = doc.addPictureData(byteInputStream,picType); 
                    doc.createPicture(blipId,doc.getNextPicNameNumber(picType), width, height,paragraph);
                  } catch (Exception e) { 
                    e.printStackTrace(); 
                  } 
                } 
              } 
            } 
            if(isSetText){ 
              run.setText(text,0); 
            } 
          } 
        } 
      } 
    } 
  } 
  /** 
   * 根據圖片類型，取得對應的圖片類型代碼 
   * @param picType 
   * @return int 
   */ 
  private static int getPictureType(String picType){ 
    int res = CustomXWPFDocument.PICTURE_TYPE_PICT; 
    if(picType != null){ 
      if(picType.equalsIgnoreCase("png")){ 
        res = CustomXWPFDocument.PICTURE_TYPE_PNG; 
      }else if(picType.equalsIgnoreCase("dib")){ 
        res = CustomXWPFDocument.PICTURE_TYPE_DIB; 
      }else if(picType.equalsIgnoreCase("emf")){ 
        res = CustomXWPFDocument.PICTURE_TYPE_EMF; 
      }else if(picType.equalsIgnoreCase("jpg") || picType.equalsIgnoreCase("jpeg")){ 
        res = CustomXWPFDocument.PICTURE_TYPE_JPEG; 
      }else if(picType.equalsIgnoreCase("wmf")){ 
        res = CustomXWPFDocument.PICTURE_TYPE_WMF; 
      } 
    } 
    return res; 
  } 
  /** 
   * 將輸入流中的數據寫入字節數組 
   * @param in 
   * @return 
   */ 
  public static byte[] inputStream2ByteArray(InputStream in,boolean isClose){ 
    byte[] byteArray = null; 
    try { 
      int total = in.available(); 
      byteArray = new byte[total]; 
      in.read(byteArray); 
    } catch (IOException e) { 
      e.printStackTrace(); 
    }finally{ 
      if(isClose){ 
        try { 
          in.close(); 
        } catch (Exception e2) { 
          System.out.println("關閉流失敗"); 
        } 
      } 
    } 
    return byteArray; 
  } 
}

我認為之所以word2003不支持圖片替換，主要是處理2003版本的HWPFDocument對象被聲明為了final，我們就無法重寫他的方法了。而處理2007版本的類為XWPFDocument，是可以繼承的，通過繼承XWPFDocument，重寫createPicture方法即可實現圖片替換，以下為對應的CustomXWPFDocument類：

package com.example.demo.util;  

import java.io.IOException; 
import java.io.InputStream; 
import org.apache.poi.openxml4j.opc.OPCPackage; 
import org.apache.poi.xwpf.usermodel.XWPFDocument; 
import org.apache.poi.xwpf.usermodel.XWPFParagraph; 
import org.apache.xmlbeans.XmlException; 
import org.apache.xmlbeans.XmlToken; 
import org.openxmlformats.schemas.drawingml.x2006.main.CTNonVisualDrawingProps; 
import org.openxmlformats.schemas.drawingml.x2006.main.CTPositiveSize2D; 
import org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline; 

/** 
 * 自定義 XWPFDocument，并重寫 createPicture()方法 
 */ 
public class CustomXWPFDocument extends XWPFDocument {  
  public CustomXWPFDocument(InputStream in) throws IOException {  
    super(in);  
  }  

  public CustomXWPFDocument() {  
    super();  
  }  

  public CustomXWPFDocument(OPCPackage pkg) throws IOException {  
    super(pkg);  
  }  

  /** 
   * @param ind 
   * @param width 寬 
   * @param height 高 
   * @param paragraph 段落 
   */ 
  public void createPicture(String blipId, int ind, int width, int height,XWPFParagraph paragraph) {  
    final int EMU = 9525;  
    width *= EMU;  
    height *= EMU;  
    CTInline inline = paragraph.createRun().getCTR().addNewDrawing().addNewInline();  
    String picXml = ""  
        + "<a:graphic xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\">"  
        + "  <a:graphicData uri=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">"  
        + "   <pic:pic xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">"  
        + "     <pic:nvPicPr>" + "      <pic:cNvPr id=\""  
        + ind  
        + "\" name=\"Generated\"/>"  
        + "      <pic:cNvPicPr/>"  
        + "     </pic:nvPicPr>"  
        + "     <pic:blipFill>"  
        + "      <a:blip r:embed=\""  
        + blipId  
        + "\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\"/>"  
        + "      <a:stretch>"  
        + "        <a:fillRect/>"  
        + "      </a:stretch>"  
        + "     </pic:blipFill>"  
        + "     <pic:spPr>"  
        + "      <a:xfrm>"  
        + "        <a:off x=\"0\" y=\"0\"/>"  
        + "        <a:ext cx=\""  
        + width  
        + "\" cy=\""  
        + height  
        + "\"/>"  
        + "      </a:xfrm>"  
        + "      <a:prstGeom prst=\"rect\">"  
        + "        <a:avLst/>"  
        + "      </a:prstGeom>"  
        + "     </pic:spPr>"  
        + "   </pic:pic>"  
        + "  </a:graphicData>" + "</a:graphic>";  

    inline.addNewGraphic().addNewGraphicData();  
    XmlToken xmlToken = null;  
    try {  
      xmlToken = XmlToken.Factory.parse(picXml);  
    } catch (XmlException xe) {  
      xe.printStackTrace();  
    }  
    inline.set(xmlToken);  

    inline.setDistT(0);   
    inline.setDistB(0);   
    inline.setDistL(0);   
    inline.setDistR(0);   

    CTPositiveSize2D extent = inline.addNewExtent();  
    extent.setCx(width);  
    extent.setCy(height);  

    CTNonVisualDrawingProps docPr = inline.addNewDocPr();   
    docPr.setId(ind);   
    docPr.setName("圖片" + ind);   
    docPr.setDescr("測試");  
  }  
}

以上就是通過POI實現html和word的相互轉換，對于html無法轉換為可讀的docx這個問題尚未解決，如果大家有好的解決方法可以交流一下。

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

java使用POI實現html和word相互轉換

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

java使用POI實現html和word相互轉換

猜你喜歡

最新資訊

相關推薦

相關標簽