【需求】
最近项目中有一个需求,需要实现 java 读取上传的 word 文件的内容
【实现方法】
现有一文档内容如下:
实现代码如下:
-
引入依赖:
<dependency> <groupId>org.apache.poigroupId> <artifactId>poi-ooxmlartifactId> <version>4.1.2version>dependency><dependency> <groupId>org.apache.poigroupId> <artifactId>poi-scratchpadartifactId> <version>4.1.2version>dependency>
-
编写工具类如下:
public class WordUtil { public static String readDocContent(String wordPath) throws Exception { String content = ""; if (wordPath.endsWith(".doc")) { FileInputStream fileInputStream = new FileInputStream(new File(wordPath)); // 获取单词提取器 WordExtractor wordExtractor = new WordExtractor(fileInputStream); content = wordExtractor.getText(); wordExtractor.close(); } else if (wordPath.endsWith(".docx")) { OPCPackage opcPackage = POIXMLDocument.openPackage(wordPath); // 获得文本提取器 POIXMLTextExtractor textExtractor = new XWPFWordExtractor(opcPackage); content = textExtractor.getText(); textExtractor.close(); } else { throw new SysException("此文件不是 word 文件"); } return content; } public static String readDocContent(InputStream inputStream, String fileName) throws IOException { String content = ""; if (fileName.endsWith(".doc")) { // 获取单词提取器 WordExtractor wordExtractor = new WordExtractor(inputStream); content = wordExtractor.getText(); wordExtractor.close(); } else if (fileName.endsWith(".docx")) { XWPFDocument xwpfDocument = new XWPFDocument(inputStream); // 获得文本提取器 POIXMLTextExtractor textExtractor = new XWPFWordExtractor(xwpfDocument); content = textExtractor.getText(); textExtractor.close(); } else { throw new SysException("此文件不是 word 文件"); } return content; }}
-
编写测试类进行测试:
@Testpublic void testReadDoc() { String wordPath = "C:\\Users\\Administrator\\Desktop\\ktest.docx"; // 根据文件路径获取内容 try { String content = WordUtil.readDocContent(wordPath); System.err.println(content); } catch (Exception e) { throw new RuntimeException(e); }// 根据输入流获取内容 try { String content2 = WordUtil.readDocContent(new FileInputStream(wordPath), "ktest.docx"); System.err.println(content2); } catch (IOException e) { throw new RuntimeException(e); }}
运行输出结果如下:
来源地址:https://blog.csdn.net/weixin_44117737/article/details/131451747