npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

scraping-pipeline

v0.0.4

Published

An asynchronous pipeline package which helps to design various scaping tasks with less effort

Downloads

16

Readme

Node.js Scraping Pipelines

Build Status

Introduction

Scraping pipeline is a typescript asynchronous module.

It helps to organize the code in pipeline applications.

It contains some generic functonal to scrap, parse, process, modify and send data.

It also let you define custom modules when the generic functionality is not enough.

Quick Start

How to install

npm i scraping-pipeline

Here are some examples to help you understand the features

Basic pipeline with custom modules

import { Pipeline, Modules } from 'scraping-pipeline';

const yourFunctionToGetSomeCsv = async (): Promise<string> => {
  const someCsv: string;
  ...
  return someCsv;
};

const yourFunctionToStoreData = async (data: any) => {
  ...
};

const getter = new Modules.General.Custom(yourFunctionToGetSomeCsv);
const parser = new Modules.General.CsvParser({ headers: true });
const saver = new Modules.General.Custom(yourFunctionToStoreData);

const pipeline = new Pipeline([getter, parser, saver]);

pipeline.run().then(() => { console.log('Done') });

Components and Types

Pipeline

Pipeline is the main component of the package. It is initiated with a pipe of Modules.

Pipeline has a method run. By running the Pipline it will execute the Modules in sequence and feed Data from one to another.

First module doesn't have feed Data.

Modules

Modules are small components which are usually doing a single task.

All Modules are implementing Modules.Base and extending Modules.Common<InputType, OutputType>.

There are some General Modules which are designed to do some standard tasks.

CsvParser

Modules.General.CsvParser is a module which helps to parse CSV Data and returns a structured output.

ArrayParser

Modules.General.ArrayParser is a generic module which helps to convert string arrays to some meaningful structure.

This module may be useful when you need to parse some raw data from documents.

It gets a ParsingTemplate as an constructor argument which lets the parser know how to convert the array to some structured data.

Custom

Modules.General.Custom<InputType, OutputType> is using a custom async function to solve custom problems.

It gets an async function as an processor wich will do the task.

The processor functions gets 3 arguments:

  • data: InputType
  • previous: any
  • old: any[]

Returns a value with OutputType type

Data

Data<T> is generic type to send Data between modules. The Data contains current, previous and old data. It stores all data passed across the Pipeline.

Usually you don't need to think about Data<T>, it is used in lower level of pipeline.

License

May be freely distributed under the MIT license.