Python怎么编写PDF拆分工具-编程学习网

这篇文章主要介绍“Python怎么编写PDF拆分工具”，在日常操作中，相信很多人在Python怎么编写PDF拆分工具问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”Python怎么编写PDF拆分工具”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

需求

需要从 PDF 中取出几页并将其保存为新的 PDF，为了后期使用方便，这个工具需要做成傻瓜式的带有 GUI 页面的形式

Python怎么编写PDF拆分工具

选择源 pdf 文件，再指定下生成的新的 pdf 文件名称及保存位置，和需要拆分的 page 信息，就可以得到新的 pdf 文件了

需求解析

对于 Python GUI，我们有太多种选择了，下面我们先来横向的简单对比下

从高层次上看，大的 GUI 工具有：

Qt
WxWindows
Tkinter
Customer libraries(Kivy，Toga等)
Web相关（HTML，Flask等）

不过今天，我们选择的工具是 appJar，这是一个由一位从事教育工作的大神发明的，所以它可以提供一个更加简单的 GUI 创建过程，而且是完全基于 Tkinter 的，Python 默认支持

代码实现

首先为了实现 PDF 操作，我这里选择了 pypdf2 库

我们先硬编码一个输入输出的示例

from PyPDF2 import PdfFileWriter, PdfFileReaderinfile = "Input.pdf"outfile = "Output.pdf"page_range = "1-2,6"

接下来我们实例化 PdfFileWriter 和 PdfFIleReader 对象，并创建实际的 Output.pdf 文件

output = PdfFileWriter()input_pdf = PdfFileReader(open(infile, "rb"))output_file = open(outfile, "wb")

下面一个比较复杂的点就是需要拆分 pdf，提取页面并保存在列表中

page_ranges = (x.split("-") for x in page_range.split(","))range_list = [i for r in page_ranges for i in range(int(r[0]), int(r[-1]) + 1)]

最后就是从原始文件中拷贝内容到新的文件

for p in range_list:    output.addPage(input_pdf.getPage(p - 1))output.write(output_file)

下面来构建 GUI 界面

对于这个拆分 PDF 的小工具，需要具有如下功能：

可以通过标准文件浏览器选择 pdf 文件
可以选择输出文件的位置及文件名称
可以自定义提取哪些页面
有一些错误检查

通过 PIP 安装好 appJar 后，我们就可以编码了

from appJar import guifrom PyPDF2 import PdfFileWriter, PdfFileReaderfrom pathlib import Path

创建 GUI 窗口

app = gui("PDF Splitter", useTtk=True)app.setTtkTheme("default")app.setSize(500, 200)

这里我使用了默认主题，当然也可以切换各种各样的主题模式

Python怎么编写PDF拆分工具

下面是添加标签和数据输入组件

app.addLabel("Choose Source PDF File")app.addFileEntry("Input_File")app.addLabel("Select Output Directory")app.addDirectoryEntry("Output_Directory")app.addLabel("Output file name")app.addEntry("Output_name")app.addLabel("Page Ranges: 1,3,4-10")app.addEntry("Page_Ranges")

接下来添加按钮，“处理”和“退出”，按下按钮，调用如下函数

app.addButtons(["Process", "Quit"], press)

最后就是运行这个 app 啦

# start the GUIapp.go()

这样我们就完成了 GUI 的搭建，下面编写内部处理逻辑。程序读取任何输入，判断是否为 PDF，并拆分

def press(button):    if button == "Process":        src_file = app.getEntry("Input_File")        dest_dir = app.getEntry("Output_Directory")        page_range = app.getEntry("Page_Ranges")        out_file = app.getEntry("Output_name")        errors, error_msg = validate_inputs(src_file, dest_dir, page_range, out_file)        if errors:            app.errorBox("Error", "\n".join(error_msg), parent=None)        else:            split_pages(src_file, page_range, Path(dest_dir, out_file))    else:        app.stop()

如果单击 “处理（Process）”按钮，则调用 app.getEntry() 检索输入值，每个值都会被存储，然后通过调用 validate_inputs() 进行验证

来看看 validate_inputs 函数

def validate_inputs(input_file, output_dir, range, file_name):    errors = False    error_msgs = []    # Make sure a PDF is selected    if Path(input_file).suffix.upper() != ".PDF":        errors = True        error_msgs.append("Please select a PDF input file")    # Make sure a range is selected    if len(range) < 1:        errors = True        error_msgs.append("Please enter a valid page range")    # Check for a valid directory    if not(Path(output_dir)).exists():        errors = True        error_msgs.append("Please Select a valid output directory")    # Check for a file name    if len(file_name) < 1:        errors = True        error_msgs.append("Please enter a file name")    return(errors, error_msgs)

这个函数就是执行一些检查来确保输入有数据并且有效

在收集验证了所有数据后，就可以调用 split 函数来处理文件了

def split_pages(input_file, page_range, out_file):    output = PdfFileWriter()    input_pdf = PdfFileReader(open(input_file, "rb"))    output_file = open(out_file, "wb")    page_ranges = (x.split("-") for x in page_range.split(","))    range_list = [i for r in page_ranges for i in range(int(r[0]), int(r[-1]) + 1)]    for p in range_list:        # Need to subtract 1 because pages are 0 indexed        try:            output.addPage(input_pdf.getPage(p - 1))        except IndexError:            # Alert the user and stop adding pages            app.infoBox("Info", "Range exceeded number of pages in input.\nFile will still be saved.")            break    output.write(output_file)    if(app.questionBox("File Save", "Output PDF saved. Do you want to quit?")):        app.stop()

到此，关于“Python怎么编写PDF拆分工具”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注编程网网站，小编会继续努力为大家带来更多实用的文章！

文章详情

Python怎么编写PDF拆分工具

需求

需求解析

代码实现

软考中级精品资料免费领

相关文章

猜你喜欢

Python怎么编写PDF拆分工具

Python自动化办公之编写PDF拆分工具

Python怎么利用PyPDF2快速拆分PDF文档

基于Python怎么编写微信清理工具

怎么用python编写一个图片拼接工具

怎么使用Python代码实现一款永久免费PDF编辑工具

怎么基于Java编写一个CLI工具？

使用python怎么编写一个本地应用搜索工具

利用Python编写数据分析工具，实现精准市场营销

基于WPF怎么编写一个串口转UDP工具

怎么用python编写垃圾分类系统

怎么使用Java工具类实现高效编写报表

怎么通过Golang编写一个AES加密解密工具

如何使用Python编写HTTP日志分析工具，优化Linux系统性能?

利用Java怎么编写一个DES加密解密工具类

Python怎么实现多表和工作簿合并及一表按列拆分

Python数据处理pandas读写操作IO工具CSV怎么使用

怎么使用Python工具分析Web服务器日志文件

怎么使用Python编写一个简单的垃圾邮件分类器