npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

pdf-lang

v0.1.2

Published

Parse, modify and serialize PDF files

Downloads

2

Readme

pdf-lang

Parse, modify and serialize PDF files with Node

idea & motivation

I was looking for a simple and small implementation of the PDF specs to manipulate PDFs on a basic level.

  • PDF.js is relativley heavy, including the possibility to render files (which is great for use with a browser).
  • Other libraries depend on command line tools (like qpdf, pdftk or mutool) that are written in another language.

install

package has no dependencies

npm i pdf-lang

example usage

list all pdf objects ordered by id

const { PDF } = require('pdf-lang')
PDF.inspect('file.pdf')
/*
1 {
  Type: Symbol(Catalog),
  Pages: obj { id: 3 },
  PageLayout: Symbol(SinglePage),
  ViewerPreferences: { PageDirection: Symbol(L2R) }
}
2 {
  Creator: 'Scribus 1.5.5',
  Producer: 'Scribus PDF Library 1.5.5',
  Title: 'PDF-File',
  CreationDate: 'D:20220914071523Z'
}
3 {
  Type: Symbol(Pages),
  Kids: [
    obj { id: 4 },
  ],
  Count: 1
}
4 {
  Type: Symbol(Page),
  Parent: obj { id: 3 },
  MediaBox: [ 0, 0, 612.28346, 858.89764 ],
  TrimBox: [ 8.50394, 8.50394, 603.77953, 850.3937 ],
  Rotate: 0,
  Contents: obj { id: 5 }
}
5 {
  Length: obj { id: 6 },
  Filter: Symbol(FlateDecode),
  [Symbol(stream)]: <Buffer 78 da 6d 8e b1 0e 82 40 0c 86 ... 129 more bytes>
}
6 139
...
*/

trim all pages by certain amount

const { PDF, mm2pt } = require('pdf-lang')
const pdf = new PDF('file.pdf')
pdf.trim(mm2pt(3)) // trim each page by 3mm
pdf.toFile('file-trimmed.pdf')

cut pages

const { PDF } = require('pdf-lang')
const pdf = new PDF('file.pdf')
pdf.cut(2) // cut horizontally into 2
// OR
pdf.cut(1, 2) // cut vertically into 2
// OR
pdf.cut(2, 2) // cut horizontally into 2 and then vertically into 2 (resulting in 4 pages)
pdf.toFile('file-cut.pdf')

replace RGB colors with CMYK

naive implementation without profiles etc.

const { PDF } = require('pdf-lang')
const pdf = new PDF('file.pdf')
pdf.toCMYK() // replacing all RGB values inside all content streams by respective CMYK values
// OR using a callback
pdf.toCMYK((rgb, cmyk) => {
  // simply returning cmyk would do the same as above
  // rgb is an Array(3) with values 0...1 (may seem odd, but that is how it is represented in pdf content streams)
  // cmyk is an Array(4) with values 0...1

  // you could implement your own lookup mechanism here
  rgb = rgb.map(v => Math.round(v * 255))
  ...
  cmyk = [ ... ]

// or do simple modifications like
  cmyk[1] *= 0.8
  return cmyk
})
pdf.toFile('file-cmyk.pdf')

traverse through the internal tree structure

The base is <PDF>.tree which contains the PDF trailer. The nodes are Proxy objects, allowing the internal references to other objects (obj { id: 2 }) to be resolved.

const { PDF } = require('pdf-lang')
const pdf = new PDF('file.pdf')
pdf.tree.Info.Producer = 'my amazing PDF-tool'
console.log(pdf.tree.Root.Pages)
/*
{
  Type: Symbol(Pages),
  Kids: [ obj { id: 8 },  obj { id: 12 } ],
  Count: 12,
  Resources: obj { id: 53 }
}
*/
console.log(pdf.tree.Root.Pages.Kids[0])
/*
{
  Type: Symbol(Page),
  Parent: obj { id: 3 },
  MediaBox: [ 0, 0, 612.28346, 858.89764 ],
  TrimBox: [ 8.50394, 8.50394, 603.77953, 850.3937 ],
  Rotate: 0,
  Contents: obj { id: 7 }
}
*/

// To retrieve an object’s id you need to "bypass" the proxy by getting the "original" object. (This works for Arrays `[]` and Objects `{}`)
const pageId = pdf.getOriginal(pdf.tree.Root.Pages.Kids)[0].id
const streamId = pdf.getOriginal(pdf.tree.Root.Pages.Kids[0]).Contents.id
// OR
const pageId = pdf.tree.Root.Pages.Kids[Symbol.for('original')][0].id
const streamId = pdf.tree.Root.Pages.Kids[0][Symbol.for('original')].Contents.id

// To get all page objects:
for (const page of pdf.getPages()) {
  ...
}
// or the first (or "n - 1"th page)
pdf.getPage(0) // first page
pdf.getPage(3) // fourth page

get/modify stream contents

Example: move all text up a bit on the first page (not very reliable but

const { PDF } = require('pdf-lang')
const pdf = new PDF('file.pdf')
const streamId = pdf.tree.Root.Pages.Kids[0][Symbol.for('original')].Contents.id
let contents = pdf.getStream(streamId).toString() // decodes the stream if it is Flate encoded
contents = contents.replace(/-?[0-9.]+\s+-?[0-9.]+(?=\s+Tm)/g, (m) => {
  m = m.split(/\s+/).map(n => +n)
  m[1] += 3
  return m.join(' ')
})

// leave the Filter as it is (e.g. FlateDecode or None)
// and replace existing stream object by providing the old id
pdf.createStream(contents, {}, streamId)
// OR
// enforce deflating data
pdf.createStream(contents, { Filter: Symbol.for('FlateDecode') }, streamId)

pdf.toFile('file-modified.pdf')

create file from scratch

This might not be very sensible, but it is (somewhat) possible

const { PDF, mm2pt } = require('pdf-lang')
const pdf = new PDF()
const A4 = [0, 0, mm2pt(210), mm2pt(297)]
const blueBox = pdf.createStream(
  '2.835 0 0 2.835 0 0 cm q 1 w 0 0 1 RG 10 10 190 277 re S Q',
  { Filter: Symbol.for('FlateDecode') }
)

pdf.tree.Root.Pages.Kids.push(pdf.createObject({
  Type: Symbol.for('Page'),
  Parent: pdf.getOriginal(pdf.tree.Root).Pages, // to get the "reference-only" object
  MediaBox: A4,
  Contents: blueBox
}))
pdf.tree.Root.Pages.Count++
// OR (more convenient)
pdf.addPage({
  MediaBox: A4, // optional with addPage(); defaults to A4
  Contents: blueBox
})

pdf.tree.Info.Title = 'Blue Box'
pdf.toFile('blue_box.pdf')

potential flaws

  • doesn’t decode or encode any other stream types than "Flate"
  • doesn’t parse content streams (yet)
  • doesn’t handle encrypted files
  • loads complete file into memory
  • can create files that are not valid PDFs