npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@spider-cloud/spider-client

v0.1.25

Published

Isomorphic Javascript SDK for Spider Cloud services

Downloads

2,792

Readme

Spider Cloud JavaScript SDK

The Spider Cloud JavaScript SDK offers a streamlined set of tools for web scraping and crawling, with capabilities that allow for comprehensive data extraction suitable for interfacing with AI language models. This SDK makes it easy to interact programmatically with the Spider Cloud API from any JavaScript or Node.js application.

Installation

You can install the Spider Cloud JavaScript SDK via npm:

npm install @spider-cloud/spider-client

Or with yarn:

yarn add @spider-cloud/spider-client

Configuration

Before using the SDK, you will need to provide it with your API key. Obtain an API key from spider.cloud and either pass it directly to the constructor or set it as an environment variable SPIDER_API_KEY.

Usage

Here's a basic example to demonstrate how to use the SDK:

import { Spider } from "@spider-cloud/spider-client";

// Initialize the SDK with your API key
const app = new Spider({ apiKey: "YOUR_API_KEY" });

// Scrape a URL
const url = "https://spider.cloud";
app
  .scrapeUrl(url)
  .then((data) => {
    console.log("Scraped Data:", data);
  })
  .catch((error) => {
    console.error("Scrape Error:", error);
  });

// Crawl a website
const crawlParams = {
  limit: 5,
  proxy_enabled: true,
  store_data: false,
  metadata: false,
  request: "http",
};
app
  .crawlUrl(url, crawlParams)
  .then((result) => {
    console.log("Crawl Result:", result);
  })
  .catch((error) => {
    console.error("Crawl Error:", error);
  });

A real world crawl example streaming the response.

import { Spider } from "@spider-cloud/spider-client";

// Initialize the SDK with your API key
const app = new Spider({ apiKey: "YOUR_API_KEY" });

// The target URL
const url = "https://spider.cloud";

// Crawl a website
const crawlParams = {
  limit: 5,
  store_data: false,
  metadata: true,
  request: "http",
};

const stream = true;

const streamCallback = (data) => {
  console.log(data["url"]);
};

app.crawlUrl(url, crawlParams, stream, streamCallback);

Data Operations

The Spider client can interact with specific data tables to create, retrieve, and delete data.

Retrieve Data from a Table

To fetch data from a specified table by applying query parameters, use the getData method. Provide the table name and an object containing query parameters:

const tableName = "pages";
const queryParams = { limit: 20 };
spider
  .getData(tableName, queryParams)
  .then((response) => console.log(response))
  .catch((error) => console.error(error));

This example retrieves data from the 'pages' table, limiting the results to 20 entries.

Delete Data from a Table

To delete data from a specified table based on certain conditions, use the deleteData method. Provide the table name and an object specifying the conditions for deletion:

const tableName = "websites";
const deleteParams = { domain: "www.example.com" };
spider
  .deleteData(tableName, deleteParams)
  .then((response) => console.log(response))
  .catch((error) => console.error(error));

Download storage data

To download stored data like raw HTML or markdown use the createSignedUrl method. Provide the website name and an object containing query parameters:

const websiteName = "spider.cloud";
const queryParams = { limit: 20, page: 0 };
spider
  .createSignedUrl(websiteName, queryParams)
  .then((response) => console.log(response))
  .catch((error) => console.error(error));

Available Methods

  • scrapeUrl(url, params): Scrape data from a specified URL. Optional parameters can be passed to customize the scraping behavior.
  • crawlUrl(url, params, stream): Begin crawling from a specific URL with optional parameters for customization and an optional streaming response.
  • search(q, params): Perform a search and gather a list of websites to start crawling and collect resources.
  • links(url, params): Retrieve all links from the specified URL with optional parameters.
  • screenshot(url, params): Take a screenshot of the specified URL.
  • transform(data, params): Perform a fast HTML transformation to markdown or text.
  • extractContacts(url, params): Extract contact information from the specified URL.
  • label(url, params): Apply labeling to data extracted from the specified URL.
  • getCrawlState(url, params): Check the website crawl state.
  • getCredits(): Retrieve account's remaining credits.
  • getData(table, params): Retrieve data records from the DB.
  • deleteData(table, params): Delete records from the DB.
  • createSignedUrl(domain, params): Download the records from the DB.

Error Handling

The SDK provides robust error handling and will throw exceptions when it encounters critical issues. Always use .catch() on promises to handle these errors gracefully.

Contributing

Contributions are always welcome! Feel free to open an issue or submit a pull request on our GitHub repository.

License

The Spider Cloud JavaScript SDK is open-source and released under the MIT License.