@rocknerve/pdftotext
v0.9.1
Published
Another simple Node.js wrapper for the popular `pdftotext` library.
Downloads
158
Readme
@rocknerve/pdftotext
Another simple Node.js wrapper for the popular pdftotext
library.
This one supports parse-until-time-limit and parse-until-maximum-text-size.
It also automatically installs pdftotext
if it runs as the root user on a Debian/Ubuntu/Mint system, which is pretty nice.
No intermediate files are used.
Install:
npm i @rocknerve/pdftotext
Example usage:
const ConvertPDFToText = require("@rocknerve/pdftotext");
your_pdf_data_buffer = await readFile("example.pdf");
your_pdf_data_buffer = await (await fetch("https://example.com/example.pdf")).arrayBuffer();
your_plain_text_string = await ConvertPDFToText({
input: { body: your_pdf_data_buffer },
timelimit_ms: 10_000, // optional; limit processing to 10 seconds
sizelimit_bytes: 65535, // optional; limit text output to 64KB
logger: (line) => console.log(`--- PDF parsing status: ${line}`), // optional; you can also pass `false` to avoid default logging to stdout
});
Potential future features:
- Allow streaming IO
- Allow PDF URLs to be passed and fetched automatically
- Make nice for Deno 2
- Support DOCX or other types too