docusaurus-to-pdf

v1.2.3

Published

3 days ago

A CLI tool for scraping Docusaurus sites into PDFs.

Downloads

417

0High
0Medium
0Low

robdominguez

docs docusaurus pdf pdf generation documentation

docusaurus-to-pdf

docusaurus-to-pdf is a CLI tool that generates a PDF from a Docusaurus-based documentation website. The tool allows customization of the scraping process via a configuration file or CLI options.

Installation

You can use npx to run the tool without installing it globally:

npx docusaurus-to-pdf

Usage

By default, the tool looks for a configuration file named scraper.config.json. However, you can override this by providing specific options through the CLI.

CLI Options

| Option | Description | Default | | ---------------------------- | ------------------------------------------------------------------ | ------------------- | | --all | Generate PDF for all directories | true | | --baseUrl <url> | Base URL of the site to scrape | | | --entryPoint <url> | Entry point for scraping (starting URL) | | | --directories <dirs...> | Specific directories to include in the scraping process (optional) | | | --customStyles <styles...> | Add custom styles as a string to override defaults (optional) | | | --output <path> | Output path for the generated PDF | ./output/docs.pdf | | --forceImages | Disable lazy loading for images | false |

Examples

Below, you'll find some example configurations that can be placed in a scraper.config.json file.

Example 1: Scraping specific directories

Only paths which include 'auth' and 'support' will be included in the output:

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --directories auth support

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "requiredDirs": ["auth", "support"]
}

Example 2: Scraping all directories

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --output ./output/all-docs.pdf

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "outputDir": "./output/all-docs.pdf"
}

Example 3: Scraping without specifying the output directory

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://docusaurus.io --entryPoint https://docusaurus.io/docs

{
  "baseUrl": "https://docusaurus.io",
  "entryPoint": "https://docusaurus.io/docs"
}

Example 4: Scraping with custom styles

This will add override the existing styles of tables to have a max-width of 3500px, which is typical for an A4 sheet of paper.

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://hasura.io --entryPoint https://hasura.io/docs/3.0 --customStyles 'table { max-width: 3500px !important }'

{
  "baseUrl": "https://hasura.io",
  "entryPoint": "https://hasura.io/docs/3.0",
  "customStyles": "table { max-width: 3500px !important }"
}

Example 5: Scraping without lazy loading on images

CLI equivalent: npx docusaurus-to-pdf --baseUrl https://docusaurus.io --entryPoint https://docusaurus.io/docs --forceImages

{
  "baseUrl": "https://docusaurus.io",
  "entryPoint": "https://docusaurus.io/docs",
  "forceImages": true
}

Contributing

We welcome contributions! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes.
Run the tests
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a pull request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

docusaurus-to-pdf

Installation

Usage

CLI Options

Examples

Example 1: Scraping specific directories

Example 2: Scraping all directories

Example 3: Scraping without specifying the output directory

Example 4: Scraping with custom styles

Example 5: Scraping without lazy loading on images

Contributing