Java使用Jsoup解析html网页的实现步骤-编程学习网

这篇文章将为大家详细讲解有关Java使用Jsoup解析html网页的实现步骤，小编觉得挺实用的，因此分享给大家做个参考，希望大家阅读完这篇文章后可以有所收获。

步骤 1：添加 Jsoup 依赖项

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.15.2</version>
</dependency>

步骤 2：获取 HTML 文档

从 URL 连接获取：

Document doc = Jsoup.connect("https://example.com").get();

从文件路径获取：

Document doc = Jsoup.parse(new File("path/to/file.html"), "UTF-8");

从 String 获取：

String html = "<html><body><h1>Hello, world!</h1></body></html>";
Document doc = Jsoup.parse(html);

步骤 3：解析 HTML 元素

按标签名称查找：

Element title = doc.select("title").first();

按类名查找：

Elements links = doc.select("a.link");

按 ID 查找：

Element header = doc.getElementById("header");

步骤 4：提取文本和属性

获取文本内容：

System.out.println(title.text()); // 输出页面标题

获取属性值：

String href = links.attr("href"); // 输出链接的 href 属性

步骤 5：处理 HTML 片段

从 HTML 片段创建 Document：

String fragment = "<div><p>Hello, world!</p></div>";
Document doc = Jsoup.parseBodyFragment(fragment);

步骤 6：遍历和操作 HTML

遍历元素树：

for (Element element : doc.getAllElements()) {
    // 对每个元素进行操作
}

修改 HTML：

// 添加一个新的元素
doc.body().append("<p>This is a new paragraph.</p>");

// 删除一个元素
element.remove();

步骤 7：保存修改后的 HTML

输出到 String：

String html = doc.outerHtml();

输出到文件：

doc.outputHtml(new File("path/to/file.html"));

以上就是Java使用Jsoup解析html网页的实现步骤的详细内容，更多请关注编程学习网其它相关文章！

文章详情