npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

pdf-dicer

v0.4.4

Published

Split PDF files into many based on barcode separators

Downloads

19

Readme

PDF-Dicer

Split PDF files into many based on barcode separators.

This is useful if scanning a large number of documents in a batch (e.g. via an automated office scanner) which then need to be split up again.

PDF-Dicer takes a single PDF file made up of multiple scanned documents. Each sub-document has a starting and ending barcode.

Input file

PDF-Dicer takes this file, splits on each barcode set, validates the barcodes and outputs back into individual files.

Output process

Installing

This module requires ImageMagick, GhostScript and Poppler.

You can install them as follows:

  • Ubuntu Linux - sudo apt-get install imagemagick ghostscript poppler-utils pdftk
  • OSX (Yosemite) - brew install imagemagick ghostscript poppler
    • Install PDFTK from website.

Example

var pdfDicer = require('pdf-dicer');

var dicer = new pdfDicer();

dicer
	.on('split', (data, buffer) => {
	  fs.writeFile('output.pdf', buffer);
	})
	.split('input.pdf', function(err, output) {
		if (err) console.log(`Something went wrong: ${err}`);
	});

API

dicer (class)

The main class of this module.

The constructor takes an optional settings object which is used to populate the initial setup.

var dicer = new pdfDicer({driver: 'quagga'});

dicer.settings (object)

An object of the instance settings. These can be set either on construction, via a call to set() or directly.

The following settings are supported:

| Setting | Type | Default | Profile | Description | |-----------------------------|-----------|---------------------------------------------------|-----------|----------------------------------------------------------------------------------| | areas | Array | {top:'0%',right:'0%',left:'0%',bottom:'0%'} | Quagga | The areas of the input pages that Quagga should scan | | imageFormat | String | png (Quagga), tif (Bardecode) | All | The intermediate image format to use before processing the barcode | | magickOptions | Object | Various (Quagga), {} (Bardecode) | All | Additional options to pass to ImageMagick when converting the PDF to images | | bardecode | Object | See below | Bardecode | Options specific to Bardecode | | bardecode.bin | String | /opt/bardecoder/bin/bardecode | Bardecode | Path to the bardecode binary | | bardecode.checkEvaluation | Boolean | true | Bardecode | Check that the barcode doesn't end in ??? and raise a warning if it does | | bardecode.serial | String | "" | Bardecode | Your Bardecode serial number | | filter | Function | (page) => true | All | Optional filter to discard pages before calculating ranges | | quagga | Object | See below | Quagga | Options specific to Quagga | | quagga.locate | Boolean | false | Quagga | Indicates if Quagga should try to detect the barcode or we should use areas | | quagga.decoder | Object | {readers:['code_128_reader'],multiple: false} | Quagga | Options passed to the Quagga decoder | | temp | Object | See below | All | Options passed to Temp when generating a temporary directory | | tempClean | Boolean | true | All | Automatically erase the temporary directory when done | | temp.prefix | String | pdfdicer- | All | The prefix used when generating a temporary directory | | threads | Object | See below | All | Options used for async threading | | threads.pages | Number | 1 | All | The number of threads allowed to run simultaneously when processing pages | | threads.areas | Number | 1 | Quagga | The number of threads allowed to run simultaneously when processing page areas |

dicer.set(setting, value)

Convenience function to quickly set a setting. Dotted notation is allowed for setting.

dicer.profile(profile)

Convenience function to configure the module with optimal settings for the supported barcode readers.

Supported profiles are:

  • quagga
  • bardecode

dicer.split(inputPath, callback)

Process the inputPath (usually a PDF) and split it into multiple PDF files.

Hook into the output of this function by trapping events.

Events

The following events are fired by this module:

| Event | Arguments | Description | |-------------------|----------------------|-------------------------------------------------------------| | stage | (stageName) | Fired for each stage of operation. ENUM: 'init', 'readPDF', 'readPages', 'extracted', 'filtering', 'loadRange', 'preSplit' | | tempDir | (path) | Fired when a temp directory has been allocated | | pageConverted | (page, pageOffset) | Fired for each page that is converted | | pagesConverted | (pages) | Fired when all pages have been converted | | pageAnalyze | (page) | Fired before an individual page is analyzed | | barcodeFiltered | (page) | Fired if a page is filtered out | | barcodePassed | (page) | Fired if a page passes filtering and is not filtered out | | pageAnalyzed | (page) | Fired after a page has been analyzed | | pagesAnalyzed | (pages) | Fired when all pages have been analyzed | | split | (range, buffer) | Fired when a range has been detected and a buffer is ready |