npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

miniscraper

v0.3.2

Published

Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

Downloads

2

Readme

miniscraper

Minimalist Node.js Web Scraper working with under-the-hood JSDOM

:cloud: Installation

# Using npm
npm install --save miniscraper

# Using yarn
yarn add miniscraper

:question: FAQ

Here are some frequent questions and their answers.

1. How to parse a web page?

miniscraperprovides a simple function for fetching and storing DOM nodes of the document of a web page. The buildDocument function from the builderpackage solves this issue. Passing as an argument the urlof the web page, the function returns the parsed DOM.

2. How to scrape specific data from a web page?

The objectSelectorAPI provides an easy way to scrape data from a web page, by providing a model object which describes a schema for the data to fetch and its identifiers. See the example section for an implementation of this functionality.

3. How to crawl a website?

miniscraperprovides a simple function to fetch all links in child nodes of a DOM element. The function getLinksof the selectorspackage is used to do this. An optional string argument can be passed to specify a text which should appear in the links.

By fetching all these links, one can then parse all the pages associated.

:clipboard: Example

import { builder, selectors, formatters } from "miniscraper";

const { buildDocument } = builder;
const { objectSelector, getLinks } = selectors;

(async () => {
  // Promise to build DOM from a specified URL
  const document = await buildDocument("example.com");

  const model = {
    title: ".title", // returns the text content of first element matching this selector
    names: { selector: [".name"], transformer: (name) => name.trim() }, // returns an array of the text content of all elements matching this selector and transforms the results with a callback function
  };

  const scrapingResults = objectSelector(document, model);
  console.log(scrapingResults);
  /*
    {
        title: 'Website title',
        names: [ 'John', 'Peter', 'James' ],
    }
    */

  // array of links which contain the 'link' string
  // easy to use feature to build a web crawler
  const links = getLinks(document, "link");
  console.log(links);
  /*
    [ 'http://example.com/link/154464', 'http://example.com/link/16516' ]
    */
})();

Versions 0.2.1 and above support for now the experimental version of a Google Search scraper. Use the crawlGoogle function from the crawlers package to get the top search results from a specific search term. Here is an example :

import { crawlers } from "miniscraper";

(async () => {
  const searchResults = await crawlers.crawlGoogle("npm miniscraper");
  console.log(searchResults);
  //Expected results
  // [
  //   {
  //     url: 'https://libraries.io/npm/miniscraper',
  //     title: 'miniscraper 0.2.1 on npm - Libraries.io',
  //     description: '1 juin 2021 — miniscraper provides a simple function to fetch all links in child nodes of a DOM element. The function getLinks of the selectors package is ...'  },
  //   {
  //     url: 'https://www.workersandco.com/fr/accueil/9667-.html',
  //     title: 'AUTO R MINI SCRAPER',
  //     description: "Désignation unique AUTO R MINI SCRAPER 10590; Nom modèle AUTO R MINI SCRAPER; Référence modèle 10590; Marque SLICE; Pays d'origine CHINE ..."
  //   },
  //   {
  //     url: 'https://www.npmjs.com/search?q=jsdom&page=7',
  //     title: 'jsdom - npm search',
  //     description: 'AngularJS provided as a CommonJS module. Compiled with jsdom when running in Node. Useful for client-side apps built with Browserify and for testing AngularJS ...'
  //   },
  //   {
  //     url: 'https://www.darty.com/nav/achat/jeux_loisirs/jeux_de_societe/jeux_de_cartes/mammut_mini_scraper_silber_katze__MK1528090525.html',
  //     title: 'Jeux de cartes Mammut Mini - scraper silber - katze | Darty',
  //     description: 'COLIS LIVRE SOUS 5 JOURS OUVRES EN MOYENNE EN ENVOI SUIVI; Livraison en France Metropolitaine uniquement HORS Corse et DOM-TOM; Produit NEUF sous garantie ...'  },
  //   ...
  // ]
})();

:dizzy: Current development

This package is at the very beginning of its development, new features are coming soon:

  • Improve Object Selector API
  • Automatic crawler
  • Customizing and filtering DOM tree

:scroll: License

MIT