npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

declarative-scraper

v0.1.1

Published

Simple & Human-Friendly HTML Scraper with Json-ld support

Downloads

6

Readme

Dopamyn Scraper

Simple & Human-Friendly HTML Scraper with Proxy Rotator.

npm

/!\ WARNING: This package is not enough mature to be used in production.

Installation

npm install --save declarative-scraper

Usage Example

// Import dependencies
import Scraper, { gotAdapter, Action } from 'declarative-scraper';
import got from 'got';

// Configure your scraper
const scraper = new Scraper({

    // Use the got package to make our http requests
    adapter: gotAdapter(got),
    // Show debug infos 
    debug: true,
    // If an error occurs while extracting item infos, we stop scraping by throwing an error
    onItemError: Action.EXCLUDE,

});

// Scrape Cryptocurrencies list
const results = await scraper.scrape({

    // 1. Basic options
    id: 'cryptocurrencies', // Identifier for debugging
    url: 'https://coinmarketcap.com/', // URL address to scrape

    // 2. Extraction
    items: $ => $('table.cmc-table > tbody > tr'), // Items to iterate
    extract: ($) => ({ // Data to extract for each item

        logo: $('> td:eq(2) img.coin-logo').attr('src'),

        // The current item will be excluded from results if the name can't be extracted
        name: $('> td:eq(2) p[font-weight="semibold"]').text()?.trim() || Action.EXCLUDE,

        price: $('> td:eq(3)').text()

    }),

    // 3. Processing
    required: ['name', 'price'], // If name or price cannot be extracted, an error will be thrown
    process: async ({ logo, name, price }) => ({ // Normalize / Format the extracted data

        logo,

        name: name.trim(),

        price: parseFloat( price.trim().replace(/[^\d\.]/g, '') )

    }),

})

Output:

[
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/1.png",
        "name": "Bitcoin",
        "price": 48415.71
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png",
        "name": "Ethereum",
        "price": 3634.48
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/2010.png",
        "name": "Cardano",
        "price": 2.49
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/1839.png",
        "name": "Binance Coin",
        "price": 429.91
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/825.png",
        "name": "Tether",
        "price": 1
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/52.png",
        "name": "XRP",
        "price": 1.12
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/5426.png",
        "name": "Solana",
        "price": 161.09
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/6636.png",
        "name": "Polkadot",
        "price": 35.9
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/74.png",
        "name": "Dogecoin",
        "price": 0.2461
    },
    {
        "logo": "https://s2.coinmarketcap.com/static/img/coins/64x64/3408.png",
        "name": "USD Coin",
        "price": 1
    }
]

Proxy Rotator

A proxy can be useful if the website you want to scrape has protections against automated traffic Since most scraping proxies limits the number of requests, we use the included proxy rotator to switch to another proxy when we reached the limit on the current one

import Scraper, { ProxyRotator } from 'declarative-scraper';

const scraper = new Scraper({
    ...
    proxy: new ProxyRotator({
        zenscrape: {
            prefix: 'https://app.zenscrape.com/api/v1/get?apikey=<key>>&url=',
            getRemaining: () => got('https://app.zenscrape.com/api/v1/status?apikey=<key>>', {
                responseType: 'json'
            }).then(res => {
                console.log(`[proxy][getRemaining] zenscrape`, res.body);
                return res.body['remaining_requests'] as number;
            })
        },
        ...
    })
});

TODO

  • Better doc
  • Strict type checking
  • Fix typings for extracted data
  • Tests