npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@fnet/sitemap-to-pdf

v0.1.8

Published

This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This to

Downloads

32

Readme

@fnet/sitemap-to-pdf

This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This tool is particularly useful for creating physical or offline copies of a website's content for archival or review purposes.

How It Works

Using the sitemap URL you provide, the tool fetches all the accessible links within that sitemap. It visits each of these pages and produces PDFs of their content. You have the option to bundle all pages into a single PDF file, or keep them as individual files based on your preference. Additionally, you can limit the number of pages to process or create bundled PDFs with a specified maximum size.

Key Features

  • Process URLs from a provided sitemap to generate PDFs.
  • Option to save each page as a separate PDF or combine them into one.
  • Capability to set a size limit for bundled PDFs, splitting larger collections into manageable parts.
  • Ability to limit the number of links processed, based on user specifications.
  • Automatically handles URL sanitization for file names.

Conclusion

The @fnet/sitemap-to-pdf project offers a practical solution for converting web pages into PDF format using a sitemap as the source. Whether you need a bundled document or distinct files for each page, this tool simplifies the task of generating offline versions of web content.

Developer Guide for @fnet/sitemap-to-pdf

Overview

The @fnet/sitemap-to-pdf library provides a convenient way to crawl a sitemap, extract webpage links, and generate PDFs of those pages. You can choose to either create a single PDF file by combining pages or generate individual PDFs for each webpage. This library is particularly useful for archiving purposes or for compiling web content into a neatly packaged PDF format.

Installation

To use @fnet/sitemap-to-pdf, you need to install it via npm or yarn. You can add it to your project by running one of the following commands:

Using npm:

npm install @fnet/sitemap-to-pdf

Using yarn:

yarn add @fnet/sitemap-to-pdf

Usage

The core functionality of this library revolves around the index function, which takes various parameters to configure the crawling and PDF generation process.

Function Signature

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: '<SITEMAP_URL>',
  outputDirectory: '<OUTPUT_DIRECTORY>',
  bundle: true, // Optional: Defaults to true
  outputFile: 'output', // Optional: Default output file name
  bundleSize: Infinity, // Optional: Size limit for each PDF bundle in MB
  limit: Infinity // Optional: Max number of pages to process
});

Parameters

  • sitemapUrl: The URL of the sitemap you wish to crawl.
  • outputDirectory: The directory where the generated PDFs will be stored.
  • bundle: (Optional) Boolean value indicating whether to combine pages into a single PDF. Defaults to true.
  • outputFile: (Optional) The base name for the bundled PDF file(s). Defaults to 'output'.
  • bundleSize: (Optional) Maximum size for each bundle PDF in MB. Defaults to Infinity, which means no size limit.
  • limit: (Optional) The maximum number of links to process from the sitemap.

Examples

Here are some practical examples to help demonstrate common use cases:

Example 1: Generate a Single Bundled PDF

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: 'https://example.com/sitemap.xml',
  outputDirectory: './pdfs',
  bundle: true,
  outputFile: 'example-site',
  bundleSize: 10 // Limit each PDF to 10 MB
});

This example generates PDFs from a sitemap and bundles them into one or more PDF files with a limited size of 10 MB per file.

Example 2: Generate Separate PDFs for Each Page

import sitemapToPdf from '@fnet/sitemap-to-pdf';

sitemapToPdf({
  sitemapUrl: 'https://example.com/sitemap.xml',
  outputDirectory: './pdfs',
  bundle: false // Generate separate PDFs for each webpage
});

In this scenario, each page is saved as a separate PDF file in the specified output directory.

Acknowledgement

This library uses several key technologies to perform its functions, such as Puppeteer for page rendering and pdf-lib for PDF manipulation. Special thanks to contributors and maintainers of these libraries that make this tool possible.

Input Schema

$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
  sitemapUrl:
    type: string
    description: The URL of the sitemap to crawl.
  outputDirectory:
    type: string
    description: The directory where the PDFs will be saved.
  bundle:
    type: boolean
    description: Whether to combine all pages into a single PDF. Defaults to true.
  outputFile:
    type: string
    description: The base name of the bundled PDF file(s).
  bundleSize:
    type: number
    description: Maximum size of each bundled PDF in MB.
  limit:
    type: number
    description: Maximum number of links to process. Defaults to all links.
required:
  - sitemapUrl
  - outputDirectory