@fnet/sitemap-to-pdf
v0.1.8
Published
This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This to
Downloads
32
Readme
@fnet/sitemap-to-pdf
This project provides a straightforward utility to convert the contents of a website's sitemap into PDF documents. By accessing the URLs listed in a sitemap, it generates PDFs of each page, which can be bundled together or saved as separate files. This tool is particularly useful for creating physical or offline copies of a website's content for archival or review purposes.
How It Works
Using the sitemap URL you provide, the tool fetches all the accessible links within that sitemap. It visits each of these pages and produces PDFs of their content. You have the option to bundle all pages into a single PDF file, or keep them as individual files based on your preference. Additionally, you can limit the number of pages to process or create bundled PDFs with a specified maximum size.
Key Features
- Process URLs from a provided sitemap to generate PDFs.
- Option to save each page as a separate PDF or combine them into one.
- Capability to set a size limit for bundled PDFs, splitting larger collections into manageable parts.
- Ability to limit the number of links processed, based on user specifications.
- Automatically handles URL sanitization for file names.
Conclusion
The @fnet/sitemap-to-pdf project offers a practical solution for converting web pages into PDF format using a sitemap as the source. Whether you need a bundled document or distinct files for each page, this tool simplifies the task of generating offline versions of web content.
Developer Guide for @fnet/sitemap-to-pdf
Overview
The @fnet/sitemap-to-pdf
library provides a convenient way to crawl a sitemap, extract webpage links, and generate PDFs of those pages. You can choose to either create a single PDF file by combining pages or generate individual PDFs for each webpage. This library is particularly useful for archiving purposes or for compiling web content into a neatly packaged PDF format.
Installation
To use @fnet/sitemap-to-pdf
, you need to install it via npm or yarn. You can add it to your project by running one of the following commands:
Using npm:
npm install @fnet/sitemap-to-pdf
Using yarn:
yarn add @fnet/sitemap-to-pdf
Usage
The core functionality of this library revolves around the index
function, which takes various parameters to configure the crawling and PDF generation process.
Function Signature
import sitemapToPdf from '@fnet/sitemap-to-pdf';
sitemapToPdf({
sitemapUrl: '<SITEMAP_URL>',
outputDirectory: '<OUTPUT_DIRECTORY>',
bundle: true, // Optional: Defaults to true
outputFile: 'output', // Optional: Default output file name
bundleSize: Infinity, // Optional: Size limit for each PDF bundle in MB
limit: Infinity // Optional: Max number of pages to process
});
Parameters
- sitemapUrl: The URL of the sitemap you wish to crawl.
- outputDirectory: The directory where the generated PDFs will be stored.
- bundle: (Optional) Boolean value indicating whether to combine pages into a single PDF. Defaults to
true
. - outputFile: (Optional) The base name for the bundled PDF file(s). Defaults to
'output'
. - bundleSize: (Optional) Maximum size for each bundle PDF in MB. Defaults to
Infinity
, which means no size limit. - limit: (Optional) The maximum number of links to process from the sitemap.
Examples
Here are some practical examples to help demonstrate common use cases:
Example 1: Generate a Single Bundled PDF
import sitemapToPdf from '@fnet/sitemap-to-pdf';
sitemapToPdf({
sitemapUrl: 'https://example.com/sitemap.xml',
outputDirectory: './pdfs',
bundle: true,
outputFile: 'example-site',
bundleSize: 10 // Limit each PDF to 10 MB
});
This example generates PDFs from a sitemap and bundles them into one or more PDF files with a limited size of 10 MB per file.
Example 2: Generate Separate PDFs for Each Page
import sitemapToPdf from '@fnet/sitemap-to-pdf';
sitemapToPdf({
sitemapUrl: 'https://example.com/sitemap.xml',
outputDirectory: './pdfs',
bundle: false // Generate separate PDFs for each webpage
});
In this scenario, each page is saved as a separate PDF file in the specified output directory.
Acknowledgement
This library uses several key technologies to perform its functions, such as Puppeteer for page rendering and pdf-lib for PDF manipulation. Special thanks to contributors and maintainers of these libraries that make this tool possible.
Input Schema
$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
sitemapUrl:
type: string
description: The URL of the sitemap to crawl.
outputDirectory:
type: string
description: The directory where the PDFs will be saved.
bundle:
type: boolean
description: Whether to combine all pages into a single PDF. Defaults to true.
outputFile:
type: string
description: The base name of the bundled PDF file(s).
bundleSize:
type: number
description: Maximum size of each bundled PDF in MB.
limit:
type: number
description: Maximum number of links to process. Defaults to all links.
required:
- sitemapUrl
- outputDirectory