npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

html-convert

v2.1.7

Published

Render a webpage and get the image as a stream

Downloads

93

Readme

html-convert

Render a webpage and get the image as a stream.

npm install html-convert

Build Status

It uses a pool of phantom processes so it doesn't need to spawn a new process for each website. New requests are added to the pool member with the shortest queue length.

Synopsis

This module depends on the phantomjs-prebuilt module, which will install PhantomJS for you if you don't already have it.

var htmlConvert = require('html-convert');
var fs = require('fs');

var convert = htmlConvert();

// convert a website url

convert('http://example.com/my-site')
  .pipe(fs.createWriteStream('out.png'));

// or as a transform stream

fs.createReadStream('some-html-file.html')
  .pipe(convert())
  .pipe(fs.createWriteStream('out.png'))

You can also pass some options:

var convert = htmlConvert({
  pool        : 5,           // Change the pool size. Defaults to 1
  queueTimeout     : 1000,   // Kills all worker jobs if queue is not empty within queueTimeout ms. Defaults to 30 seconds.
  fetchTimeout:     : 1000,  // Page url fetch timeout. Defaults to 30 seconds.
  renderTimeout     : 1000,  // Render timeout in milliseconds. Defaults to 30 seconds.
  strictRender:false, // Flag to fail render on any js errors, resource errors or timeouts. Defaults to false.
  tmp         : '/tmp',      // Set the tmp where tmp data is stored when communicating with the phantom process.
                             //   Defaults to /tmp if it exists, or os.tmpDir()
  format      : 'jpeg',      // The default output format. Defaults to png
  quality     : 100,         // The default image quality. Defaults to 100. Only relevant for jpeg format.
  width       : 1280,        // Changes the width size. Defaults to 1280
  height      : 800,         // Changes the height size. Defaults to 960
  paperFormat : 'A4',        // Defaults to A4. Also supported: 'A3', 'A4', 'A5', 'Legal', 'Letter', 'Tabloid'.
  orientation : 'portrait',  // Defaults to portrait. 'landscape' is also valid
  margin      : '0cm',       // Defaults to 0cm. Supported dimension units are: 'mm', 'cm', 'in', 'px'. No unit means 'px'.
  userAgent   : '',          // No default.
  headers     : {Foo:'bar'}, // Additional headers to send with each upstream HTTP request
  paperSize:  : null,        // Defaults to the paper format, orientation, and margin.
  crop        : false,       // Defaults to false. Set to true or {top:5, left:5} to add margin
  printMedia  : false,       // Defaults to false. Force the use of a print stylesheet.
  maxErrors   : 3,           // Number errors phantom process is allowed to throw before killing it. Defaults to 3.
  expects     : 'something', // No default. Do not render until window.renderable is set to 'something'
  retries     : 1,           // How many times to try a render before giving up. Defaults to 1.
  phantomFlags: ['--ignore-ssl-errors=true'] // Defaults to []. Command line flags passed to PhantomJS
  maxRenders  : 500,          // How many renders can a phantom process make before being restarted. Defaults to 500

  injectJs    : ['./includes/my-polyfill.js'] // Array of paths to polyfill components or external scripts that will be injected when the page is initialized
});

Or override the options for each render stream

convert(myUrl, {format:'jpeg', quality: 100, width: 1280, height: 960}).pipe(...)

Supported output formats

We support the output formats that PhantomJS's render method supports. At the time of this writing these are:

  • png
  • gif
  • jpg
  • pdf

Example

Since the interface is just a stream you can pipe the web site anywhere! Try installing picture-tube and run the following example

var htmlConvert = require('html-convert');
var pictureTube = require('picture-tube');
var convert = htmlConvert();

convert('http://google.com')
  .pipe(pictureTube())
  .pipe(process.stdout);

Deferred convert

If you need your page to do something before phantom renders it you just need to immediately set window.renderable to false. If that is set when the page is opened the module will wait for window.renderable to be set to true and when this happens the render will occur.

Here is an example to illustrate it better.

<!DOCTYPE HTML>
<html lang="en">
<head>
  ...
  <script type="text/javascript">window.renderable = false</script>
  <meta charset="UTF-8">
  <title></title>
</head>
<body>

</body>
...
<script type="text/javascript">
  doSomeAjaxLoading(function() {
    doSomeRendering();
    window.renderable = true;
  })
</script>
</html>

Adding Cookies

You can add any special cookies at render time. For format, see http://phantomjs.org/api/webpage/method/add-cookie.html. Example:

var convert = htmlConvert({
  pool: 5,
  format: 'pdf'
  // other opts
});

convert('http://somewhere.com', {
  cookies: [{
    'name'     : 'Valid-Cookie-Name',   /* required property */
    'value'    : 'Valid-Cookie-Value',  /* required property */
    'domain'   : 'localhost',
    'path'     : '/foo',                /* required property */
    'httponly' : true,
    'secure'   : false,
    'expires'  : (new Date()).getTime() + (1000 * 60 * 60)   /* <-- expires in 1 hour */
  }]
}).pipe(somewhereElse);

That will use that cookie for that particular render job. You probably want to set the expires property to something fairly short, as there may not be a guarantee that a pooled phantom process won't pick up the cookie for a particular render job, and you may want that session to only be valid for an individual job run.

Injecting JavaScript

Sometimes you need to inject polyfills, e.g. PhantomJS Date.parse is broken. You can add paths to local files to polyfill broken / missing features of PhantomJS using the opts.injectJs property. Example:

var convert = htmlConvert({
  injectJs: ['./includes/my-date-polyfill.js']
});

Obviously, make sure the path './includes/my-date-polyfill.js' is resolvable from the project root, or pass in an absolute path. When the page is initialized, any scripts you listed there will be injected before any rendering happens.

Extra Dependencies

For rendering, PhantomJS requires the fontconfig library, which may be missing if you're using Ubuntu Server. To install on Ubuntu:

sudo apt-get install libfontconfig

Troubleshooting

Render stream emits "log" event with useful debug details coming from onError (JS error), onConsoleMessage, onResourceError, onResourceTimeout webpage hooks.

var convert = htmlConvert();

convert('http://somewhere.com')
  .on('log', function(log) {
    // {type: 'error', data: {msg: 'ReferenceError: Can\'t find variable: a', trace: [..]}}
  })
  .pipe(res);

Also, some additional debugging output may be enabled by running your app with a DEBUG environment variable set as follows:

DEBUG=phantom-render-stream  node ./your-script.js

If you are getting undefined error codes and responses when attempting to render, it's likely a connection issue of some sort. If the URL uses SSL, adding --ignore-ssl-errors=true to phantomFlags may help. You also try adding --debug=true to the phantomFlags array.

See Also

  • wkhtmltopdf is a Node module that uses wkhtmltopdf to convert HTML to PDF. It is similar in that it uses Webkit and produces output as a stream, and different in that it doesn't use PhantomJS. Also, wkhtmotopdf only supports PDF output.

License

MIT