npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

pietro

v0.6.1

Published

Convert PDF files to PNG and text files

Downloads

30

Readme

π Pietro

Utilities to split PDF files into smaller files, generate thumbnails and extract textual content.

This module is a wrapper on top of pdftk, xpdf and other tools to transform a raw PDF into a format that could be parsed by a machine.

⚠ This package has been created out of a personal need of sharing those methods across projects. The default (or sometimes hardcoded) values of some of the methods make sense to me. If they do not fit your needs, feel free to open issues or pull requests to adapt the code, I'd be happy to make it more extensible.

const pdf = await pietro.init('./path/to/file.pdf');

// Split a large file into smaller files
await pdf.extractAllPages('./dist');

// Convert one page to PNG
const page42 = await pietro.init('./dist/0042.pdf');
await page42.toImage('./images/0042.pdf');

// Get textual content of one page
await page42.getText();

// Extract all images from a pdf
await page42.extractImages('./images');

Requirements

The module internally calls command-line tools to do the grunt work. You need the following commands to be available in your $PATH:

Methods

.init(pathToPdf)

Create a Pietro instance of a given PDF file.

const pdf = await pietro.init('./path/to/file.pdf');

.pageCount()

Get the number of pages in the PDF.

const count = await pdf.pageCount();
console.info(`The file has ${count} pages`);

.getText()

Return textual content from the PDF

const content = await pdf.getText();

.extractPage(pageIndex, destination)

Extract one specific page of the PDF.

await pdf.extractPage(42, './page-42.pdf');

.extractAllPages(destinationDirectory)

Split the PDF into one file per page, in the specified directory.

await pdf.extractAllPages('./pages');
// Will create ./pages/0001.pdf, ./pages/0002.pdf, etc

.toImage(destination)

Convert the PDF to an image. This is better applied on one-page PDFs.

await pdf.toImage('./thumbnail.png');

.extractImages(destinationDirectory)

Extract all images embedded in the PDF into the specified directory.

await pdf.extractImages('./images');
// Will create ./images/000.png, ./images/001.png, etc

.imageList()

Returns an array of metadata about all images in the file

await pdf.imageList()
//
//  [
//    {
//      pageIndex: 1,
//      imageIndex: 0,
//      type: 'image',
//      width: 2625,
//      height: 1688,
//      color: 'icc',
//      objectID: 72,
//      size: '45.5K'
//    },
//    […]
//  ]

Development

As this module requires binary to be installed on the machine, the CircleCI instance used to run tests have those binaries installed (check .circleci/config.yml and ./scripts/install-dependencies).

Upgrading the base CircleCI image to have a newer version of Node usually also upgrades the underlying OS used in the image, which can affect the binaries installed. To better troubleshoot how to install the needed binaries on CircleCI, a helper script (yarn run simulate-circleci) is included, which spawns a Docker image similar to the one used on CircleCI.

Once inside the container, you can try running the tests, and if you see any failure, you can try running the actual command and see if all dependencies are present. Some versions of Ubuntu no longer have xpdf in their package and require a manual install, or the default ImageMagick policy might disable converting PDFs.