pdf 是 PDF 文件处理的瑞士军刀——覆盖读取、提取、合并、拆分、旋转、加水印、表单填充、OCR 等 8 种操作,通过 8 个独立 Python 脚本实现。
- 📖 使用 pypdf 进行基础 PDF 操作(读取/合并/拆分/旋转/加密)
- 🔍 使用 pdfplumber 提取文本和表格
- 📝 使用 reportlab 创建 PDF
- 📋 表单处理子系统(5 个脚本:检查可填充字段、提取字段结构、填充表单、验证边界框、创建验证图像)
- 🖼️ PDF 转图像 + OCR 识别
- 🔐 密码保护和水印添加
用户提及 PDF 文件、需要处理 PDF 表格、需要填写 PDF 表单、需要合并/拆分 PDF、需要对扫描版 PDF 做 OCR 识别。
One-Line Summary
Section titled “One-Line Summary”pdf is a Swiss Army knife for PDF processing — covering reading, extraction, merging, splitting, rotation, watermarks, form filling, and OCR through 8 independent Python scripts.
Core Capabilities
Section titled “Core Capabilities”- 📖 Basic PDF operations with pypdf (read/merge/split/rotate/encrypt)
- 🔍 Text and table extraction with pdfplumber
- 📝 PDF creation with reportlab
- 📋 Form processing subsystem (5 scripts: check fillable fields, extract field structure, fill forms, validate bounding boxes, create validation images)
- 🖼️ PDF to image conversion + OCR
- 🔐 Password protection and watermarking
Trigger Scenarios
Section titled “Trigger Scenarios”User mentions PDF files, needs to process PDF tables, fill PDF forms, merge/split PDFs, or OCR scanned PDFs.
File Inventory
Section titled “File Inventory”目录结构分析
Section titled “目录结构分析”pdf 的结构属于**“脚本工具箱”型**——一个核心 SKILL.md 提供总览和快速参考,forms.md 和 reference.md 提供扩展指引,scripts/ 目录包含 8 个独立 Python 脚本,各自处理一种 PDF 操作。
SKILL.md 结构解析
Section titled “SKILL.md 结构解析”约 315 行的 SKILL.md,核心结构包含:
- Quick Start:基于 pypdf 的快速入门代码
- Python Libraries:三大库的使用说明
- pypdf:基础操作(合并、拆分、旋转、元数据、水印、加密)
- pdfplumber:文本和表格提取
- reportlab:PDF 创建(Canvas/Platypus)
- Command-Line Tools:pdftotext、qpdf、pdftk 的备选方案
- Common Tasks:扫描件 OCR、水印、图像提取、密码保护
- Quick Reference:7 类任务的工具推荐速查表
YAML Frontmatter 分析
Section titled “YAML Frontmatter 分析”TRIGGER 条件极为宽泛——“用户想对 PDF 做任何操作”,包括读取、提取、合并、拆分、旋转、水印、新建、表单填充、加密解密、图像提取、OCR。这种宽触发策略适合覆盖全面但操作相对标准化的领域。
8 个脚本彼此独立,各自处理一种具体的 PDF 操作。其中 5 个脚本构成了表单处理子系统。SKILL.md 提供工具选择指南(pypdf vs pdfplumber vs reportlab),但具体指令在 SKILL.md 中内联提供,而非依赖单独的参考文档。
设计模式分类
Section titled “设计模式分类”pdf 是典型的**“脚本工具箱”(Script Toolkit)** 型 Skill:
Directory Structure Analysis
Section titled “Directory Structure Analysis”The pdf skill follows a “Script Toolkit” pattern — a core SKILL.md providing overview and quick reference, forms.md and reference.md for extended guidance, and scripts/ directory with 8 independent Python scripts, each handling one PDF operation.
SKILL.md Structure Analysis
Section titled “SKILL.md Structure Analysis”An ~315-line SKILL.md with core sections:
- Quick Start: pypdf-based quick start code
- Python Libraries: Usage instructions for 3 libraries
- pypdf: Basic operations (merge, split, rotate, metadata, watermark, encrypt)
- pdfplumber: Text and table extraction
- reportlab: PDF creation (Canvas/Platypus)
- Command-Line Tools: Alternatives with pdftotext, qpdf, pdftk
- Common Tasks: Scanned PDF OCR, watermarking, image extraction, password protection
- Quick Reference: Tool recommendation table for 7 task types
YAML Frontmatter Analysis
Section titled “YAML Frontmatter Analysis”TRIGGER conditions are extremely broad — “user wants to do anything with PDF files”, including reading, extraction, merging, splitting, rotation, watermarking, creation, form filling, encryption/decryption, image extraction, OCR. This broad trigger strategy suits a domain that is comprehensive but relatively standardized.
Module Relationships
Section titled “Module Relationships”The 8 scripts are independent, each handling a specific PDF operation. 5 of them form a form processing subsystem. SKILL.md provides tool selection guidance (pypdf vs pdfplumber vs reportlab), but specific instructions are provided inline within SKILL.md rather than through separate reference documents.
Design Pattern Classification
Section titled “Design Pattern Classification”pdf is a typical “Script Toolkit” Skill:
| 特征 | 说明 |
|---|---|
| 脚本数 | 8 个独立 Python 脚本 |
| 参考文档 | SKILL.md + forms.md + reference.md |
| 工具选择 | 内置在 SKILL.md 的 Quick Reference 表中 |
| 脚本关系 | 各自独立,由表单处理子系统连接部分脚本 |
| 库策略 | 三重库:pypdf(基础)+ pdfplumber(提取)+ reportlab(创建) |
| Feature | Description |
|---|---|
| Script Count | 8 independent Python scripts |
| Reference Docs | SKILL.md + forms.md + reference.md |
| Tool Selection | Built into SKILL.md Quick Reference table |
| Script Relationships | Independent, partially connected by form processing subsystem |
| Library Strategy | Triple library: pypdf (basic) + pdfplumber (extraction) + reportlab (creation) |
pdf 包含 8 个 Python 脚本,全部独立且各自处理一种 PDF 操作。
fill_pdf_form_with_annotations.py
Section titled “fill_pdf_form_with_annotations.py”这是表单填充的核心脚本,实现了**“注解式”表单填充**,适用于不支持标准 AcroForm 的 PDF。
核心逻辑分为三个步骤:
- 坐标转换:支持两种坐标系统(图像坐标 → PDF 坐标、PDF 坐标 → pypdf 坐标),通过
transform_from_image_coords和transform_from_pdf_coords两个函数实现 - 注解创建:使用 pypdf 的
FreeText注解类,在每个字段位置创建包含正确字体、字号、颜色的文本注解 - 写入输出:将原 PDF 和所有注解合并写入新文件
关键设计亮点是坐标转换层——它负责处理”视觉空间”到”PDF 内部空间”的映射。
Script Inventory
Section titled “Script Inventory”pdf contains 8 Python scripts, all independent and each handling a specific PDF operation.
Detailed Analysis
Section titled “Detailed Analysis”fill_pdf_form_with_annotations.py
Section titled “fill_pdf_form_with_annotations.py”The core form filling script implementing “annotation-style” form filling, suitable for PDFs that don’t support standard AcroForm.
Core logic in three steps:
- Coordinate transformation: Supports two coordinate systems (image → PDF and PDF → pypdf) via
transform_from_image_coordsandtransform_from_pdf_coordsfunctions - Annotation creation: Uses pypdf’s
FreeTextannotation class, creating text annotations with correct font, size, and color at each field location - Write output: Merges original PDF with all annotations into a new file
The key design highlight is the coordinate transformation layer — it handles the mapping from “visual space” to “PDF internal space”.
| 脚本 | 功能 | 依赖 |
|---|---|---|
fill_pdf_form_with_annotations.py | 使用 FreeText 注解填充表单 | pypdf |
fill_fillable_fields.py | 填充标准可填充表单字段 | pypdf |
extract_form_field_info.py | 提取表单字段结构(含定位) | pypdf |
extract_form_structure.py | 提取表单整体结构 | pypdf |
check_fillable_fields.py | 检查 PDF 是否有可填充字段 | pypdf |
check_bounding_boxes.py | 检查字段边界框有效性 | pypdf |
create_validation_image.py | 创建表单填充验证图像 | pypdf, Pillow |
convert_pdf_to_images.py | PDF 转 PNG 图像(含尺寸缩放) | pdf2image, Pillow |
| Script | Function | Dependency |
|---|---|---|
fill_pdf_form_with_annotations.py | Fill forms via FreeText annotations | pypdf |
fill_fillable_fields.py | Fill standard fillable form fields | pypdf |
extract_form_field_info.py | Extract form field structure (with positioning) | pypdf |
extract_form_structure.py | Extract overall form structure | pypdf |
check_fillable_fields.py | Check if PDF has fillable fields | pypdf |
check_bounding_boxes.py | Validate field bounding boxes | pypdf |
create_validation_image.py | Create form fill validation images | pypdf, Pillow |
convert_pdf_to_images.py | PDF to PNG conversion (with size scaling) | pdf2image, Pillow |
convert_pdf_to_images.py
Section titled “convert_pdf_to_images.py”PDF 转图像的实用工具,使用 pdf2image 将 PDF 页面渲染为 PNG 图像。
核心逻辑:将 PDF 通过 pdf2image 转换为 PIL Image 对象,对超出最大尺寸的图片进行等比例缩放,然后保存为 PNG 文件。支持自定义输出目录和最大尺寸限制(默认 1000px)。
这个脚本的输出在表单处理视觉验证流水线中扮演关键角色——转换后的图像被用于 create_validation_image.py 进行视觉比对。
convert_pdf_to_images.py
Section titled “convert_pdf_to_images.py”A PDF to image utility using pdf2image to render PDF pages as PNG images.
Core logic: Converts PDF to PIL Image objects via pdf2image, proportionally scales images exceeding maximum size, then saves as PNG files. Supports custom output directory and maximum dimension limits (default 1000px).
This script’s output plays a key role in the form processing visual verification pipeline — the converted images are used by create_validation_image.py for visual comparison.
extract_form_field_info.py
Section titled “extract_form_field_info.py”表单结构解析工具,分析 PDF 中的 AcroForm 字段。
核心功能:
- 递归遍历字段树,提取全限定字段名
- 解析字段类型(文本框 / 复选框 / 单选组 / 下拉框)
- 定位字段在页面中的位置(Rect 坐标)
- 对单选组,提取所有选项的位置坐标
- 按页面 + Y 轴位置排序输出
该脚本是表单填充流水线的起点——它的输出 JSON 被用作 fill_pdf_form_with_annotations.py 的输入。
其余脚本简要说明
Section titled “其余脚本简要说明”- check_bounding_boxes.py:验证字段边界框是否在页面范围内
- check_fillable_fields.py:检测 PDF 是否包含可填充字段
- create_validation_image.py:将填充后的表单可视化,用于人工验证
- extract_form_structure.py:提取表单整体结构(层级、分组、嵌套)
- fill_fillable_fields.py:使用标准 AcroForm API 填充表单字段(与注解式方法互补)
表单处理子系统
Section titled “表单处理子系统”5 个脚本共同构成了一条完整流水线:检查 → 提取结构 → 提取字段详情 → 填充(两种方式) → 验证。这是 pdf skill 中最精密的子系统设计。
extract_form_field_info.py
Section titled “extract_form_field_info.py”A form structure parsing tool that analyzes AcroForm fields in PDFs.
Key features:
- Recursively traverses the field tree, extracting fully qualified field names
- Parses field types (text/checkbox/radio group/dropdown)
- Locates field positions on pages (Rect coordinates)
- For radio groups, extracts position coordinates of all options
- Sorts output by page + Y-axis position
This script is the starting point of the form filling pipeline — its JSON output serves as input to fill_pdf_form_with_annotations.py.
Brief Summary of Remaining Scripts
Section titled “Brief Summary of Remaining Scripts”- check_bounding_boxes.py: Validates that field bounding boxes are within page boundaries
- check_fillable_fields.py: Detects if a PDF contains fillable fields
- create_validation_image.py: Visualizes filled forms for manual verification
- extract_form_structure.py: Extracts overall form structure (hierarchy, grouping, nesting)
- fill_fillable_fields.py: Fills form fields using standard AcroForm API (complementary to annotation method)
Form Processing Subsystem
Section titled “Form Processing Subsystem”5 scripts form a complete pipeline: check → extract structure → extract field details → fill (2 methods) → validate. This is the most sophisticated subsystem design in the pdf skill.
- “脚本工具箱”模式:8 个独立、单用途脚本,各司其职,没有复杂的模块依赖。用户可以根据需要只运行一个脚本,无需处理不相关的依赖
- 表单处理子系统:5 个脚本构建了从分析到验证的完整流水线,展示了如何将多个独立工具组合成工作流
- 三重库策略:不是用一个库解决所有问题,而是根据操作类型选择最合适的库——pypdf 做基础操作,pdfplumber 做提取,reportlab 做创建
- 坐标转换层:表单填充脚本中的坐标转换抽象,屏蔽了图像坐标系和 PDF 坐标系之间的差异
“如果你想为另一类文件格式创建类似的处理 Skill(如图片、音频、视频)…”
- 分析文件格式的操作类型:列出用户可能需要的所有操作(读取、转换、编辑、验证)
- 为每种操作创建独立脚本:遵循”一个脚本一个操作”的原则
- 建立核心参考:SKILL.md 提供快速参考表格,链接到各个脚本
- 识别子系统:如果某些操作构成工作流(如 分析→编辑→验证),将它们组织为子系统
- 选择合适的库:和 pdf 的三重库策略一样,不要试图用一个工具解决所有问题
⚠️ pypdf 版本兼容性: pypdf 在不同版本间 API 变化较大(特别是注解 API),需要指定版本范围
⚠️ 表单类型判断: 不是所有 PDF 都使用标准 AcroForm——有些使用 XFA 表单(pypdf 不支持),需要 fallback 到注解式填充
⚠️ 坐标系统混淆: PDF 使用基于左下角的坐标系统,而图像使用基于左上角的坐标系统——坐标转换错误是表单处理中的首要错误源
⚠️ reportlab 的字体限制: reportlab 的内置字体不支持 Unicode 上/下标字符,必须使用 XML 标签或手动调整位置
Design Highlights
Section titled “Design Highlights”- “Script Toolkit” Pattern: 8 independent, single-purpose scripts with no complex module dependencies. Users can run only the one they need without unrelated dependencies
- Form Processing Subsystem: 5 scripts build a complete pipeline from analysis to validation, demonstrating how independent tools combine into a workflow
- Triple Library Strategy: Rather than using one library for everything, choose the best tool per operation type — pypdf for basic, pdfplumber for extraction, reportlab for creation
- Coordinate Transformation Layer: The coordinate transform abstraction in form filling scripts shields differences between image and PDF coordinate systems
Reusable Patterns
Section titled “Reusable Patterns”Porting Guide
Section titled “Porting Guide”“If you want to create a similar processing Skill for another file format (e.g., images, audio, video)…”
- Analyze operation types: List all operations users might need (read, convert, edit, validate)
- Create independent scripts per operation: Follow “one script, one operation” principle
- Establish core reference: SKILL.md provides quick reference table, linking to each script
- Identify subsystems: If certain operations form a workflow (analyze → edit → validate), organize them as subsystems
- Choose the right libraries: Like pdf’s triple library strategy, don’t try to solve everything with one tool
Common Pitfalls
Section titled “Common Pitfalls”⚠️ pypdf version compatibility: API changes significantly between pypdf versions (especially annotation API) — specify version ranges
⚠️ Form type detection: Not all PDFs use standard AcroForm — some use XFA forms (unsupported by pypdf), requiring fallback to annotation-style filling
⚠️ Coordinate system confusion: PDF uses bottom-left based coordinate system while images use top-left — coordinate conversion errors are the #1 source of form processing bugs
⚠️ reportlab font limitations: Built-in fonts don’t support Unicode subscript/superscript characters — must use XML tags or manual positioning
| 模式 | 说明 | 适用于... |
|---|---|---|
| 脚本工具箱 | 独立、单用途脚本的集合,无复杂依赖 | 操作类型多样但各自独立的领域 |
| 工作流流水线 | 多个工具脚本串联为完整处理流程 | 需要分析 → 处理 → 验证的复杂任务 |
| 三重库策略 | 为不同操作类型选择不同库 | 没有单一库能覆盖所有需求的场景 |
| 坐标抽象层 | 屏蔽不同坐标系统的差异 | 涉及图像和文档坐标互转的场景 |
| Pattern | Description | Applies to... |
|---|---|---|
| Script Toolkit | Collection of independent, single-purpose scripts | Domains with diverse but independent operations |
| Pipeline Workflow | Multiple tool scripts chained into a complete process | Complex tasks requiring analysis → processing → validation |
| Triple Library Strategy | Different libraries for different operation types | Scenarios where no single library covers all needs |
| Coordinate Abstraction Layer | Shields differences between coordinate systems | Scenarios involving image/document coordinate conversion |