npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

percollate-fork

v0.7.3

Published

A command-line tool to grab web pages as PDF

Downloads

2

Readme

percollate fork

This is a fork of https://github.com/danburzo/percollate

Percollate is a command-line tool to turn web pages into beautifully formatted PDFs. See How it works.

Example spread from the generated PDF of a chapter in Dimensions of Colour; rendered here in black & white for a smaller image file size.

Table of Contents

Installation

💡 percollate needs Node.js version 8.6.0 or later, as it uses new(ish) JavaScript syntax. If you get SyntaxError: Unexpected token errors, check your Node version with node --version.

You can install percollate globally:

# using npm
npm install -g percollate

# using yarn
yarn global add percollate

To keep the package up-to-date, you can run:

# using npm, upgrading is the same command as installing
npm install -g percollate

# yarn has a separate command
yarn global upgrade --latest percollate

Usage

💡 Run percollate --help for a list of available commands. For a particular command, percollate <command> --help lists all available options.

Available commands

| Command | What it does | | ----------------- | -------------------------------------------------- | | percollate pdf | Bundles one or more web pages into a PDF | | percollate epub | Bundles one or more web pages into an EPUB | | percollate html | Bundles one or more web pages into a HTML file | | percollate md | Bundles one or more web pages into a Markdown file |

Available options

The pdf, epub, html, and md commands have these options:

| Option | What it does | | -------------- | -------------------------------------------------------------------------------------------------------------- | | -o, --output | The path of the resulting bundle; when ommited, we derive the output file name from the title of the web page. | | --individual | Export each web page as an individual file. | | --template | Path to a custom HTML template | | --style | Path to a custom CSS | | --css | Additional CSS styles you can pass from the command-line to override the default/custom stylesheet styles | | --no-amp | Don't prefer the AMP version of the web page | | --debug | Print more detailed information | | --toc | Include a Table of Contents page |

Examples

Basic PDF generation

To transform a single web page to PDF:

percollate pdf --output some.pdf https://example.com

To bundle several web pages into a single PDF, specify them as separate arguments to the command:

percollate pdf --output some.pdf https://example.com/page1 https://example.com/page2

Instead of an url you can provide a path to a file:

percollate pdf --output my.pdf ./example.html

You can use common Unix commands and keep the list of URLs in a newline-delimited text file:

cat urls.txt | xargs percollate pdf --output some.pdf

To transform several web pages into individual PDF files at once, use the --individual flag:

percollate pdf --individual https://example.com/page1 https://example.com/page2

The --css option

The --css option lets you pass a small snippet of CSS to percollate. Here are some common use-cases:

Custom page size / margins

The default page size is A5 (portrait). You can use the --css option to override it using any supported CSS size:

percollate pdf --css "@page { size: A3 landscape }" http://example.com

Similarly, you can define:

  • custom margins, e.g. @page { margin: 0 }
  • the base font size: html { font-size: 10pt }

Changing the font stacks

The default stylesheet includes CSS variables for the fonts used in the PDF:

:root {
	--main-font: Palatino, 'Palatino Linotype', 'Times New Roman',
		'Droid Serif', Times, 'Source Serif Pro', serif, 'Apple Color Emoji',
		'Segoe UI Emoji', 'Segoe UI Symbol';
	--alt-font: 'helvetica neue', ubuntu, roboto, noto, 'segoe ui', arial,
		sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol';
	--code-font: Menlo, Consolas, monospace;
}

| CSS variable | What it does | | ------------- | ------------------------------------- | | --main-font | The font stack used for body text | | --alt-font | Used in headings, captions, et cetera | | --code-font | Used for code snippets |

To override them, use the --css option:

percollate pdf --css ":root { --main-font: 'PT Serif';  --alt-font: Roboto; }" http://example.com

💡 To work correctly, you must have the fonts installed on your machine. Custom web fonts currently require you to use a custom CSS stylesheet / HTML template.

Remove the appended hrefs from hyperlinks

The idea with percollate is to make PDFs that can be printed without losing where the hyperlinks point to. However, for some link-heavy pages, the appended hrefs can become bothersome. You can remove them using:

percollate pdf --css "a:after { display: none }" http://example.com

The --style option

The --style option lets you use your own CSS stylesheet instead of the default one. Here are some common use-cases for this option:

⚠️ TODO add examples here

The --template option

The --template option lets you use a custom HTML template for the PDF.

💡 The HTML template is parsed with nunjucks, which is a close JavaScript relative of Twig for PHP, Jinja2 for Python and L for Ruby.

Here are some common use-cases:

Customizing the page header / footer

Puppeteer can print some basic information about the page in the PDF. The following CSS class names are available for the header / footer, into which the appropriate content will be injected:

  • date — The formatted print date
  • title — The document title
  • url — document location (Note: this will print the path of the temporary html, not the original web page URL)
  • pageNumber — the current page number
  • totalPages — total pages in the document

👉 See the Chromium source code for details.

You place your header / footer template in a template element in your HTML:

<template class="header-template">
	My header
</template>

<template class="footer-template">
	<div class="text center">
		<span class="pageNumber"></span>
	</div>
</template>

See the default HTML for example usage.

You can add CSS styles to the header / footer with either the --css option or a separate CSS stylesheet (the --style option).

💡 The header / footer template do not inherit their styles from the rest of the page (i.e. they are not part of the cascade), so you'll have to write the full CSS you want to apply to them.

An example from the default stylesheet:

.footer-template {
	font-size: 10pt;
	font-weight: bold;
}

How it works

  1. Fetch the page(s) using got
  2. If an AMP version of the page exists, use that instead (disable with --no-amp flag)
  3. Enhance the DOM using jsdom
  4. Pass the DOM through mozilla/readability to strip unnecessary elements
  5. Apply the HTML template and the print stylesheet to the resulting HTML
  6. Use puppeteer to generate a PDF from the page

Limitations

Percollate inherits the limitations of two of its main components, Readability and Puppeteer (headless Chrome).

The imperative approach Readability takes will not be perfect in each case, especially on HTML pages with atypical markup; you may occasionally notice that it either leaves in superfluous content, or that it strips out parts of the content. You can confirm the problem against Firefox's Reader View. In this case, consider filing an issue on mozilla/readability.

Using a browser to generate the PDF is a double-edged sword. On the one hand, you get excellent support for web platform features. On the other hand, print CSS as defined by W3C specifications is only partially implemented, and it seems unlikely that support will be improved any time soon. However, even with modest print support, I think Chrome is the best (free) tool for the job.

Troubleshooting

On some Linux machines you'll need to install a few more Chrome dependencies before percollate works correctly. (Thanks to @ptica for sorting it out)

The percollate pdf command supports the --no-sandbox Puppeteer flag, but make sure you're aware of the implications before disabling the sandbox.

Contributing

Contributions of all kinds are welcome! See CONTRIBUTING.md for details.

See also

Here are some other projects to check out if you're interested in building books using the browser: