npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

node-iceberg

v1.7.1

Published

nodejs lightweight scraper/crawler

Downloads

21

Readme

NPM

Codacy Badge Build Status Code-Style Dependency Status Known Vulnerabilities Coverage Status

node-iceberg

A lightweight Node.js tree-based scraper/crawler. No more callbacks! Just async/await.

Installation

Download and install via npm package manager (stable):

npm install node-iceberg --save

Or clone fresh code directly from git:

git clone https://github.com/manuasir/node-iceberg.git
cd node-iceberg
npm install

Usage

This package allows to get filtered DOM elements from URLs throw customized iterators, and it works mainly in two ways:

  • Scraper mode: Get elements using customized selectors.
// Example: download all links from a Blogspot URL. Use inside an 'async' function
const Iceberg = require('node-iceberg')
const getThisDomains = ['mediafire','mega','adf.ly'] // domains we can get
const conf = {
	// Iterator: Element that gets you to the next URL to process in blogspot
	iteratorElement: {
    	element: 'a',
    	cssClass: 'blog-pager-older-link' // optional
	},
	// Desired data to get extracted from the DOM. Example: Download links
	selector: {
    	element: 'a',
    	attrib: 'href',
    	values: getThisDomains // optional
	}
}
// Max Level Depth to explore: max blog pages
const maxLevelDepth = 10
const scraper= new Iceberg("http://someblog.blogspot.com")
const results = await scraper.start(maxLevelDepth,conf)
// or load a filter:
const scraper= new Iceberg("http://someblog.blogspot.com")
const conf = scraper.service('blogspot')
const results = await scraper.start(maxLevelDepth,conf)

Some websites are allowed to paginate directly from the URL using parameters. Ej: http://url?page=1,2,3... In that case, use it like this: pass only the configuration object and set max page within it:

// Example: get insecure cameras: insecam.org
const Iceberg = require('node-iceberg')
const conf = {
    iteratorElement: {
        url: url,
        iterator: '?page=',
        maxPage: 5 // maximum page to explore
    }, selector: { // elements we want to get
        element: 'img',
        attrib: 'src'
        }
    }
const scraper = new Iceberg("http://insecam.org")
const results = scraper.start(conf).then((results) => { console.log(results) }).catch((err)=>{ throw err })
  • Crawler mode: Explores ALL links from a URL until the depth threshold is reached. Generates a tree from crawled data. Already explored ways are included only once.
// Warning! this can become a very costful task
const Iceberg = require('node-iceberg')
const crawler = new Iceberg('http://reddit.com')
const conf = crawler.service('crawler')
const maxLevelDepth = 2
const results = crawler.start(maxLevelDepth,conf)then((results) => { console.log(results) }).catch((err)=>{ throw err })

Test

If you downloaded this package from NPM, it's already tested. Otherwise you can test it like this:

npm test