npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

sasori-crawl

v1.0.0

Published

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

Downloads

87

Readme

Project Description:

Sasori is a powerful and flexible dynamic web crawler built on Puppeteer. It allows you to automate the crawling of web applications, even those behind authentication, offers seamless integration with security testing tools like Zaproxy or Burp Suite and provides customizable configurations for enhanced flexibility.

Features

  • Dynamic Crawling: Sasori excels at crawling dynamic web applications, handling AJAX-loaded content, and interacting with complex user interfaces.

  • Authentication Support: Easily spider applications behind authentication barriers by passing the puppeteer recording for the login sequence.

  • Proxy Integration: Sasori provides the option to set up a proxy server, allowing you to route requests through tools like Zaproxy or Burp Suite for security testing.

  • State-Based Navigation: The project is designed around a state-based system, keeping track of URLs, DOM structures, and interactable elements for efficient crawling.

  • Efficient Endpoint Coverage: Utilizes efficient algorithms for intelligent crawling, ensuring coverage of more endpoints while maintaining speed.

  • Crawl Customization: Allows you to customize what elements to interact with to target specific use cases.

Getting Started:

To get started with Sasori, follow these steps:

Recommended

  1. Install the package globally:
npm install -g sasori-crawl
  1. Create Sasori's configuration file:
sasori init
  1. Edit the configuration file. Check Configuration

  2. Run Sasori:

sasori start --config /path/to/config.json

Alternative

  1. Clone the repository:
git clone https://github.com/karthikuj/sasori.git
  1. Install dependencies:
cd sasori
npm install
  1. Configure Sasori by editing the configuration file in the config directory. Check Configuration.

  2. Run Sasori:

node . start --config ./config/config.json

Configuration

The Sasori configuration consists of two main sections: browser and crawler. Each section contains specific settings to customize the behavior of the crawler and the browser used for crawling.

Browser Configuration

The browser section contains settings related to the browser used by the crawler.

  • headless: (boolean) Specifies whether the browser should run in headless mode. Default: false.
  • maximize: (boolean) Specifies whether the browser window should be maximized. Default: false.
  • proxy: (object) Configuration for proxy settings.
    • enabled: (boolean) Specifies whether proxy is enabled.
    • host: (string) Hostname of the proxy server. Required if enabled is true.
    • port: (integer) Port of the proxy server. Required if enabled is true.

Example:

{
  "browser": {
    "headless": true,
    "maximize": false,
    "proxy": {
      "enabled": true,
      "host": "proxy.example.com",
      "port": 8080
    }
  }
}

Crawler Configuration

The crawler section contains settings related to the behavior of the crawler.

  • entryPoint: (string) URL of the entry point from where the crawling starts. Required.
  • eventTimeout: (integer) Timeout (in milliseconds) for waiting for events during crawling. Required.
  • navigationTimeout: (integer) Timeout (in milliseconds) for waiting for navigation to complete during crawling. Required.
  • eventWait: (integer) Timeout (in milliseconds) for waiting between events during crawling. Required.
  • maxDuration: (integer) Maximum duration (in milliseconds) for the crawling process. 0 means crawl indefinitely. Required.
  • elements: (array of css paths) List of HTML css paths to click during crawling. Required.
  • maxChildren: (integer) Maximum number of child elements to crawl from each parent state. 0 means infinite children. Required.
  • maxDepth: (integer) Maximum depth of the crawling process. 0 means infinite depth. Required.
  • authentication: (object) Authentication settings for crawler.
    • basicAuth: (object) Configuration for HTTP basic authentication.
      • enabled: (boolean) Specifies whether basic authentication is enabled.
      • username: (string) Username for basic authentication. Required if enabled is true.
      • password: (string) Password for basic authentication. Required if enabled is true.
    • recorderAuth: (object) Configuration for recorder based authentication.
      • enabled: (boolean) Specifies whether recorder based authentication is enabled.
      • pptrRecording: (string) Path to Puppeteer recording for authentication. Required if enabled is true.

Example:

{
  "crawler": {
    "entryPoint": "https://example.com",
    "eventTimeout": 10000,
    "navigationTimeout": 30000,
    "eventWait": 1000,
    "maxDuration": 600000,
    "elements": ["a", "button", "input[type=\"submit\"]"],
    "maxChildren": 10,
    "maxDepth": 5,
    "authentication": {
      "basicAuth": {
        "enabled": true,
        "username": "user",
        "password": "password"
      },
      "recorderAuth": {
        "enabled": false,
        "pptrRecording": "/path/to/pptr/recording.json"
      }
    },
    "includeRegexes": ["https?://example\\.com(?:/.*|)"],
    "excludeRegexes": ["^.*\\.pdf$", "https?://example\\.com/logout"]
  }
}

Creating a puppeteer recording

  1. Open DevTools in Google Chrome and click on the 3 dots icon in the top-right corner.
  1. Go to More tools > Recorder.
  1. Click on Create a new recording.

  2. Give a name to your recording and then click on Start recording.

  3. Create the recording and then click on End recording.

  1. Lastly export the recording by clicking on the downward arrow and then choosing JSON as the type.

Contributing:

Contributions to Sasori are welcome! If you encounter any bugs, have feature requests, or would like to contribute code improvements, please follow the guidelines in the CONTRIBUTING.md file.

License:

This project is licensed under the MIT License.