npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

interrobot-plugin

v0.14.0

Published

Used in the creation of plugins (aka reports) that present data or visualizations for a website in aggregate, across site content. For use with InterroBot application.

Downloads

152

Readme

Your web crawler just got superpowers. InterroBot plugins transform your web crawler into a customizable data powerhouse, unleashing unlimited potential for data extraction and analysis.

InterroBot plugins are simple HTML/JS/CSS pages that transform raw web crawl data into profound insights, stunning visualizations, and interactive dashboards. With our flexible API, you can create custom plugins that analyze website content across entire domains, connecting with analytics, LLMs, or your favorite SaaS for deeper insights.

Our plugin ecosystem is designed for versatility. Whether you're building proprietary tools, developing plugins for clients, or contributing to the open-source community, InterroBot plugins adapt to your needs. Available for Windows 10/11, macOS, and Android, our platform ensures your data analysis can happen wherever you work.

How Does it Work?

InterroBot hosts an iframe of your webpage and exposes an API from which you can pull data down for analysis.

If you're familiar with vanilla TypeScript or JavaScript, creating a custom plugin script for InterroBot is remarkably straight forward. First you start with a bare-bones HTML file and a script extending the Plugin base class.

// TypeScript vs. JavaScript, both are fine. See examples.
import { Plugin } from "./src/ts/core/plugin";
class BasicExamplePlugin extends Plugin {    
    static meta = {
        "title": "Example Plugin",
        "category": "Example",
        "version": "1.0.0",
        "author": "InterroBot",
        "synopsis": `a basic plugin example`,
        "description": `This example is as simple as it gets.`,
    };
    constructor() {
        super();
        // index() has nothing to do with the crawl index, btw. it is 
        // the plugin index (think index.html), a view that shows by
        // default, and would generally consist of a form or visualization.
        this.index();
    }
}
// configure to load when page is ready
Plugin.initialize(BasicExamplePlugin);

BasicExamplePlugin will not do much at this point, but it will load and run the default index() behavior. You can, of course, override the default index() behavior, rendering your page however you wish.

protected async index() {
    // add your form and supporting HTML
    this.render(`<div>HTML</div>`);
    // initialize the plugin within InterroBot, from within iframe
    await this.initData(BasicExamplePlugin.meta, {}, []);    
    // add handlers to the form
    const button = document.querySelector("button");
    button.addEventListener("click", async (ev) => { 
        await this.process(); // where process() is a form handler
    });
}

The process() method called above would be where you process data. Here a query is executed on the crawl index, and each result run through the exampleResultsHandler.

protected async process() {
    // gather title words and running counts with a result handler
    const titleWords: Map<string, number> = new Map<string, number>();
    let resultsMap: Map<number, SearchResult>;
    const exampleResultHandler = async (result: SearchResult, 
        titleWordsMap: Map<string, number>) => {
        const terms: string[] = result.name.trim().split(/[\s\-—]+/g);
        terms.forEach(term => titleWordsMap.set(term, 
            (titleWordsMap.get(term) ?? 0) + 1));
    }
    // projectId comes for free as a member of Plugin
    const projectId: number = this.getProjectId();
    // anything you put into InterroBot search, field or fulltext works
    // here we limit to HTML documents, which will have a <title> -> name
    const freeQueryString: string = "headers: text/html";
    // pipe delimited fields you want retrieved. id and url come with 
    // the base model, everything else must be requested explicitly
    const fields: string = "name";
    const internalHtmlPagesQuery = new SearchQuery(projectId, 
        freeQueryString, fields, SearchQueryType.Any, false);
    // run each SearchResult through its handler, and we're done processing
    await Search.execute(internalHtmlPagesQuery, resultsMap, "Processing…", 
        async (result: SearchResult) => {
            await exampleResultHandler(result, titleWords);
        }
    );
    // call for HTML presentation of titleWords with processing complete
    await this.report(titleWords);
}

The above snippets are pulled (and gently modified) from a plugin in the repository, basic.js. For more ideas getting started, check out the examples directory.

What data is available via API?

InterroBot's robust API provides plugin developers with access to crawled data, enabling deep analysis and useful customizations. This data forms the foundation of your plugin, allowing you to create insightful visualizations, perform complex analysis, or build interactive tools. Whether you're tracking SEO metrics, analyzing content structures, or developing custom reporting tools, our API offers the flexibility and depth you need. Below is an overview of the key data points available, organized by API endpoint:

GetProjects

Retrieves a list of projects using the Plugin API.

Optional Fields

| Field | Description | |-------|-------------| | created | ISO 8601 date/time, project created | | image | datauri of project icon | | modified | ISO 8601 date/time, project modified |

GetResources

Retrieves a list of resources associated with a project using the Plugin API.

Optional Fields

| Field | Description | |-------|-------------| | assets | array of assets, HTML only | | content | page/file contents | | created | ISO 8601 date/time, crawled resource | | headers | HTTP headers | | links | array of outlinks, HTML only | | modified | ISO 8601 date/time, resource modified | | name | page/file name | | norobots | crawler indexable | | origin | forwarding URL, if applicable | | size | size in bytes | | status | HTTP status code | | time | request time, in millis | | type | resource type, html, pdf, image, etc. |

GetCrawls

Retrieves a list of crawls using the Plugin API.

Optional Fields

| Field | Description | |-------|-------------| | created | ISO 8601 date/time, crawl created | | modified | ISO 8601 date/time, crawl modified | | report | Crawl details as JSON | | time | Crawl time in millis |

Licensing

MPL 2.0, with exceptions. This repo contains JavaScript to TypeScript ports and a Markdown library based on existing code, all contained within ./src/lib. As they arrived under existing licenses, they will remain under those.

  • Typo.js: TypeScript port continues under the original Modified BSD License.
  • Snowball.js: TypeScript port continues under the original MPL 1.1 license.
  • HTML To Markdown Text: The Markdown library contains a modified version of an HTML to Markdown XSLT transformer by Michael Eichelsdoerfer. MIT license.

The InterroBot plugins and the Typo.js TypeScript port make use of a handful of unmodified Hunspell dictionaries, as found in wooorm's UTF-8 collection: dictionary-en, dictionary-en-gb, dictionary-es, dictionary-es-mx, dictionary-fr, and dictionary-ru.