npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

documentspark

v1.6.2

Published

documentspark :: document -> picture_of_page[]

Downloads

13

Readme

:sparkling_heart: DocumentSpark

Simple secure document viewing server. Used by Viewfinder™

Converts a document to a picture of its pages. View a document from the internet without downloading or running it on your machine, and without needing a word processor, spreadsheet app, or PDF viewer installed. This provides content disarm and reconstruction, or CDR. Also known as p2., this code is deployed commercially by Dosyago in their ViewFinder cloud browser product.

From the comments

This is a very simple server in NodeJS to accept a document upload (or a URL) and convert that document (using ImageMagick, LibreOffice and GhostScript) into a series of images, one for each page of the document.

The point was originally to allow people to view documents securely (such as email attachments) without needing to run nor download said document to their own devices. It was successful in doing that, but its use grew to becoming ad-hoc document hosting where people were attracted to the ability to access a page of a document, without needing to download the entire document.

The code is shared as something you can build upon and adapt to your uses in the open. It's not meant as a finished solution, it's meant as a starting point, something to give you ideas for how to implement your own version, or something to plug in to your own open-source work. The project was originally called "p2." for "PDF to ...", but it works on a wide range of source documents, including DOCX and (often but not always) XLSX, and so on. It doesn't work on HTML or TXT.

Use it

$ git clone https://github.com/dosyago/documentspark.git
$ cd documentspark
$ ./scripts/setup.sh 
$ ./scripts/restart.sh

Or:

$ npm i documentspark@latest
$ cd node_modules/documentspark
$ ./scripts/setup.sh 
$ ./scripts/restart.sh

If you have SSL certs in $HOME/sslcerts/ these will be used (including mkcert localhost certs!), if not the server will run on HTTP. It will run under pm2 and default to port 443. You can supply a custom port with npm start <PORT>.

Navigate to yourserver:your_port/secretpage-canneverbefound.html to convert a document. You can input either a file, or a URL. It may not always be possible to obtain a document from the URL.

Document view pages are not protected by any authentication, they are simply chosen pseudo-randomly. You can modify the code to give document viewing pages longer, more securely random URLs.

By default, converted documents are cleaned out after 3 days. You can change this in /public/uploads/clean.sh which runs every few minutes and cleans any documents older than 4319 minutes (roughly 3 days).

Make it an API

There's a very simple "master key" secret parameter sent with the POST request. You can call this POST endpoint via a secure HTTPS API (using multitype/form encoding) and pass your custom secret= as a parameter to authorize the conversion.

System Requirements

You need a beefy machine. 4 cores, with 8 GB RAM for most documents. But more is better. Smaller machines will routinely run out of memory or take a long time when running the libreoffice, imagemagick and gs jobs.

Improving perf

You can try recompiling ImageMagick to have multicore support. I found this significantly improves performance.

Thanks to*

*No affiliation

License

Licensed under AGPL-3.0.

If you'd like to deploy this in your org without going open-source or for a for-profit project where youd want to release the source under AGPL-3.0 as well, write me ([email protected]) about a license exemption.