siteshooter
v1.14.0
Published
Automate full website screenshots and PDF generation with multiple viewports
Downloads
32
Maintainers
Readme
Siteshooter
Automate full website screen shots and PDF generation with multiple view port support
Features
- Crawls specified host and generates a
sitemap.xml
on the fly - Generates entire website screen shots based on
sitemap.xml
- Define multiple view ports
- Automated PDF generation
- Includes crawled meta data in generated PDF
- Reports on broken website links (404 http response)
- Supports HTTP basic authentication
- Supports Microsoft Online 3 step authentication
- Supports Salesforce Visualforce 3 step authentication
- Supports site maps with HTTP, HTTPS, and FTP protocol URLs
- Follows HTTP 301 redirects
- Custom JavaScript inject file - injects into page prior to screen shooting
- Trigger page events by passing querystring values to custom inject.js file
Do you need a website and workflow management platform?
In This Documentation
Getting Started
Dependencies
Install the following prerequisite on your development machine:
Notable npm Modules
Quick Start
$ npm install siteshooter --global
If siteshooter is installed, make sure you have the latest version by running:
$ npm update siteshooter --global
- You may need to run these commands with elevated privileges, e.g.
sudo
, you will be prompted to do so if needed. - Installing with the
--global
flag affords you thesiteshooter
command on your machine's command line at any path. - Read more about the
--global
flag here.
Create a Siteshooter Configuration File
$ siteshooter --init
Update Siteshooter Configuration File
View the full siteshooter.yml example
Inside siteshooter.yml
, add additional options.
- All Simple Web Crawler options can be added to
sitecrawler_options
and will pass through to the crawler process - Generated screenshot image files are optimized using imagemin and imagemin-pngquant modules, which reduce the overall size of generated PDFs. To adjust the image quality, update the image_quality option in your siteshooter.yml file.
domain:
name: https://www.devopsgroup.io
auth:
user:
pwd:
pdf_options:
excludeMeta: true
screenshot_options:
delay: 2000
image_quality: '60-80'
transparent_background: false
sitecrawler_options:
exclude:
- "pdf"
stripQuerystring: false
ignoreInvalidSSL: true
viewports:
- viewport: desktop-large
width: 1600
height: 1200
- viewport: tablet-landscape
width: 1024
height: 768
- viewport: iPhone5
width: 320
height: 568
- viewport: iPhone6
width: 375
height: 667
CLI Options
$ siteshooter --help
Usage: siteshooter [options]
OPTIONS
_______________________________________________________________________________________
-c --config Show configuration
-C --cwd Set working directory, which will load a siteshooter.yml file in the specified path
-e --debug Output exceptions
-h --help Print this help
-i --init Create siteshooter.yml template file in working directory
-p --pdf Generate PDFs, by defined view ports, based on screen shots created via Siteshooter
-q --quiet Only return final output
-s --screenshots Generate screen shots, by view ports, based on sitemap.xml file
-S --sitemap Crawl domain name specified in siteshooter.yml file and generate a local sitemap.xml file
-v --version Print version number
-V --verbose Verbose output
-w --website Report on website information based on Siteshooter crawled results
When running a siteshooter
command without any options, the following options will run in order by default:
--sitemap
--screenshots
--pdf
Custom JavaScript Inject File
To manipulate the DOM, prior to the screen shot process, add a inject.js
file in the same working directory as the siteshooter.yml
.
Example: inject.js file
/**
* @file: inject.js
* @description: used to inject custom JavaScript into a web page prior to a screen shot.
*/
console.log('JavaScript injected into page.');
if ( typeof(jQuery) !== "undefined" ) {
jQuery(document).ready(function() {
console.log('jQuery loaded.');
});
}
Trigger JavaScript Events
When using the optional inject.js
file, events can be triggered based on the following querystring parameter - pevent
// Add URL with pevent querystring parameter in the generated sitemap.xml
<url>
<loc>https://www.devopsgroup.io?pevent=open-privacy-overlay</loc>
<changefreq>weekly</changefreq>
</url>
Example: Event detection & triggering
/**
* @file: inject.js
* @description: used to inject custom JavaScript into a web page prior to a screen shot.
*/
function getQueryVariable(variable) {
var query = window.location.search.substring(1);
var vars = query.split('&');
for (var i = 0; i < vars.length; i++) {
var pair = vars[i].split('=');
if (decodeURIComponent(pair[0]) == variable) {
return decodeURIComponent(pair[1]);
}
}
}
if ( typeof(jQuery) !== "undefined" ) {
jQuery(document).ready(function() {
var pageName = window.location.pathname.replace('/', ''),
pageEvent = getQueryVariable('pevent');
console.log('document ready.');
console.log('userAgent', navigator.userAgent);
console.log('Page: ', pageName);
console.log('Event: ', pageEvent);
switch (pageName) {
// home
case '':
switch (pageEvent) {
case 'open-privacy-overlay':
jQuery('a[data-target~="#modal-privacy"]').trigger('click');
break;
}
break;
}
});
}
Tests
Tests are written with Mocha and can be run with npm test
.
Troubleshooting
If you're having issues with Siteshooter, submit a GitHub Issue.
- Make sure you have a
siteshooter.yml
file in your working directory and the yaml file is well formatted - Experiencing font-loading issues? Try increasing the delay setting in your siteshooter.yml file
screenshot_options:
delay: 2000
- Trying to take a screenshot of a page with a video? Unfortunately, PhantomJS does not support videos. As such, here's one approach to showing a video's poster image.
/**
* @file: inject.js
* @description: used to display a video's poster image
*/
if( jQuery('video').length >0 ){
jQuery('video').parent().prepend('<img src="'+jQuery('video').attr('poster')+'"/>');
jQuery('video').remove();
}
- SimpleCrawler TypeError: The header content contains invalid characters
- Try setting the acceptCookies option to false
sitecrawler_options:
acceptCookies: false
Code of Conduct
Take a moment to read or Code of Conduct
Contributing to the project
We are always looking for quality contributions! Please check the CONTRIBUTING.md for contribution guidelines.