@bsorrentino/pdf-tools
v1.2.1
Published
[![npm](https://img.shields.io/npm/v/@bsorrentino/pdf-tools.svg)](https://www.npmjs.com/package/@bsorrentino/pdf-tools) <img src="https://img.shields.io/github/forks/bsorrentino/pdf-tools.svg"> <img src="https://img.shields.io/github/stars/bso
Downloads
21
Readme
pdf-tools
Tools to extract/transform data from PDF
inspired by project: pdf-to-markdown
Installation
npm install @bsorrentino/pdf-tools -g
Requirements
- NodeJs >= 16
- Since pdf-tools use
canvas
that is aCairo
-backed Canvas implementation for Node.js take a look to its reqirements
pdftools Commands
common options
-o, --outdir [folder] output folder (default: "out")
pdfximages
extract images (as png) from pdf and save it to the given folder
Usage:
pdftools pdfximages|pxi [options] <pdf>
pdf2images
create an image (as png) for each pdf page
Usage:
pdftools pdf2images|p2i <pdf>
pdf2md
convert pdf to markdown format.
Usage:
pdftools pdf2md|p2md [options] <pdf>
Options:
-ps, --pageseparator [separator] add page separator (default: "---")
--imageurl [url prefix] imgage url prefix
--stats print stats information
--debug print debug information
Conversion to Markdown
supported features
- Detect headers
- Detect and extract images
- Extract plain text
- Extract fonts and allow custom mapping through a generated file
<document name>.font.json
Supported fonts bold, italic,
monospace
, bold+italic - Detect code block ( i.e.
```
) - Detect external link
TO DO
- Detect TOC