@web-master/node-web-fetch

v0.10.0

Published

3 years ago

Fetch web data as easy as possible

Downloads

125

0High
0Medium
0Low

saltyshiomix

fetch crawler scraper node nodejs node.js typescript web

Description

It is the combination of @web-master/node-web-crawler and @web-master/node-web-scraper.

It can:

FETCH
- SCRAPE
  - It scrapes the specific page
  - It gathers data from the page according to the ScrapeConfig
- CRAWL
  - It scrapes the specific page and gathers links
  - It crawls the links and scrapes each page of the link
  - It gathers data from each page according to CrawlConfig

Installation

$ npm install --save @web-master/node-web-fetch

Usage

Single Page Scraping

Basic

import fetch from '@web-master/node-web-fetch';

const data = await fetch({
  target: 'http://example.com',
  fetch: {
    title: 'h1',
    info: {
      selector: 'p > a',
      attr: 'href',
    },
  },
});

console.log(data);
// {
//   title: 'Example Domain',
//   info: 'http://www.iana.org/domains/example'
// }

Waitable (by using `puppeteer`)

import fetch from '@web-master/node-web-fetch';

const data = await fetch({
  target: 'http://example.com',
  waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
  fetch: {
    title: 'h1',
    info: {
      selector: 'p > a',
      attr: 'href',
    },
  },
});

console.log(data);
// {
//   title: 'Example Domain',
//   info: 'http://www.iana.org/domains/example'
// }

Multi Pages Crawling

You Know the target urls already

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: [
    'https://example1.com',
    'https://example2.com',
    'https://example3.com',
  ],
  fetch: () => ({
    title: 'h1',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

You Don't Know the Target Urls so Want to Crawl Dynamically

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Waitable (by using `puppeteer`)

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

TypeScript Support

import fetch from '@web-master/node-web-fetch';

interface HackerNewsPage {
  title: string;
}

const pages: HackerNewsPage[] = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Description

Installation

Usage

Single Page Scraping

Basic

Waitable (by using puppeteer)

Multi Pages Crawling

You Know the target urls already

You Don't Know the Target Urls so Want to Crawl Dynamically

Waitable (by using puppeteer)

TypeScript Support

Related

Waitable (by using `puppeteer`)

Waitable (by using `puppeteer`)