npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

pdfutils

v0.3.2

Published

tool for analyzing and converting PDF documents.

Downloads

16

Readme

Flattr this git repo

PDF Utils for node

This library contains tools for analysing and converting PDF files. You can get metadata, extract text, render pages to svg or png, all with our beloved asynchronous programming style.

It is planed to support extracting links from the document and create ImageMaps (You remember them, don't you?) on the fly. Also pdfutils should support password locked files. But that's still on the todo.

The library is currently beta. This means it has incomplete error handling and it lacks a testing suite.

Installation

To install pdfutils you have to install libpoppler-glib first.

Using Debian execute:

apt-get install libpoppler-glib-dev libpoppler-glib8 libcairo2-dev libcairo2

Using MacOS and Macports:

port install poppler

or if you prefere brew:

brew install poppler --with-glib
export PKG_CONFIG_PATH=/usr/X11/lib/pkgconfig

Then install pdfutils

npm install pdfutils

Usage

See this very basic example:

var pdfutils = require('pdfutils').pdfutils;

pdfutils("document.pdf", function(err, doc) {
	doc[0].asPNG({maxWidth: 100, maxHeight: 100}).toFile("firstpage.png");
});

3sloc to generate thumbnails of PDFs. Awesome!

Here a bit more documentation:

pdfutils(source, callback)

this function is a factory for Documents

arguments:

  • source: can be a Buffer or a String. If it's a string, read from the file. If it's a buffer, treat the buffer content as in-memory PDF. Please make sure to not change the buffer while using it by pdfutils!
  • callback(err, doc): a callback with the following arguments:
    • err: an error string when the pdf couldn't be loaded successfully, otherwise null
    • doc: an instance of Document when the pdf is loaded successfully, otherwise undefined

Class PDFDocument

This class is generated by pdfutils(source, callback) described above.

members:

  • 0, 1, 2, 3, 4, ... , n instances of the Pages contained by the Document. See description of Page below
  • length: number of Pages in a document
  • author: the author of the document or null if not known
  • creationDate: the creation date as integer since 1970-01-01
  • creator: creator of the document or null if unknown
  • format: exact format of this PDF file or null if unknown
  • keywords: keywords of the document as string or null if unknown
  • linearized: true if document is linearized, otherwise false
  • metadata: Metadata as string
  • modDate: last modification of pdf as integer since 1970-01-01
  • pageLayout: the layout of the pages. Can be on of the following strings or null if unknown:
    • singlePage
    • oneColumn
    • twoColumnLeft
    • twoColumnRight
    • twoPageLeft
    • twoPageRight
  • pageMode: the suggested viewing mode of a page. Can be one of the following strings or null if unkown:
    • none
    • useOutlines
    • useThumbs
    • fullscreen
    • useOc
    • useAttachments
  • permissions: the permissions of this document. Is an object with the following members:
    • print: whether the user is allowed to print
    • modify: whether the user is allowed to modify the document
    • copy: whether the user is allowed to take copies of this document
    • notes: whether the user is allowed to make notes
    • fillForm: whether the user is allowed to fill out forms
  • producer: producer of a document or null if unknown
  • subject: subject of this document or null if unknown
  • title: title of the document or null if unknown

Class PDFPage

This class represents a page of a document

members:

  • width: width of the document
  • height: width of the document
  • index: number of this page.
  • label: label of this page or null if no label was defined.
  • links: array containing links of a page
  • asSVG(opts): returns an instance of PageJob described below, opts is an optional argument with an Object with the following optional fields:
    • maxWidth: maximal width of the resulting SVG in px.
    • minWidth: minimal width of the resulting SVG in px.
    • maxHeight: maximal height of the resulting SVG in px.
    • minHeight: minimal height of the resulting SVG in px.
    • width: the width of the resulting SVG in px. Overwrites minWidth and maxWidth.
    • height: the height of the resulting SVG in px. Overwrites minHeight and maxHeight.
  • asPDF(opts): returns an instance of PageJob described below, opts is an optional argument with an Object with the following optional fields:
    • maxWidth: maximal width of the resulting PDF in pt.
    • minWidth: minimal width of the resulting PDF in pt.
    • maxHeight: maximal height of the resulting PDF in pt.
    • minHeight: minimal height of the resulting PDF in pt.
    • width: the width of the resulting PDF in pt. Overwrites minWidth and maxWidth.
    • height: the height of the resulting PDF in pt. Overwrites minHeight and maxHeight.
  • asPNG(opts): returns an instance of PageJob described below, opts is an optional argument with an Object with the following optional fields:
    • maxWidth: maximal width of the resulting PNG in px
    • minWidth: minimal width of the resulting PNG in px
    • maxHeight: maximal height of the resulting PNG in px
    • minHeight: minimal height of the resulting PNG in px
    • width: the width of the resulting PNG in px. Overwrites minWidth and maxWidth.
    • height: the height of the resulting PNG in px. Overwrites minHeight and maxHeight.
  • asText(opts): returns an instance of PageJob described below. opts is an optional argument with an Object, which is currently ignored.

Class PDFPageJob

This class inherits Stream. It handles converting a Page (described above) to SVG, PNG or Text

members:

  • links: array containing links of a page, translated to fit the output page.

events:

  • data: emitted when a new chunk of the converted file is available
  • end: emitted when the file is successfully converted
  • error: emitted when the file cannot be converted. Is not implemented yet.

members:

  • toFile(path, [options]): writes a page to the file in the desired format.
  • see Stream for further members.

License

This module is licensed under GPL.