html-scraper

v0.1.3

Published

2 years ago

The scraper has **three** components: `http` → `split` → `extract`, executed in the same order. You make an `http` request to fetch a page, `split` the page into different sections and ultimately `extract` the data from each section using a custom parser.

Downloads

0High
0Medium
0Low

tusharmathur

#HTML Scraper The scraper has three components: http → split → extract, executed in the same order. You make an http request to fetch a page, split the page into different sections and ultimately extract the data from each section using a custom parser.

The best part about this scraper is that you can create a chain of actions that you need to perform.

###Order of execution http → split → extract → http → split and so on…

###Example Say I want extract al the information of students who got admitted to the University of Southern California, Los Angeles. I would do it as follows —

Make an http request to this page — http://edulix.com/universityfinder/university_of_southern_california.
Page consists of multiple anchor tags containing links of each

	# Standard require
    Scraper = require 'HTML-Scraper'
	
    # Specify the key to read urls
    Scraper().http 'url'
    .split '.archive a'
    .extract (doc) ->
        href: "http://tusharm.com" +  doc.attr 'href'
        text: doc.html()
    .http 'href'
    .extract ($) ->
        http: $('a:nth-child(2)').attr('href')
    
    #Launch with base params
    .$launch  url: 'http://tusharm.com/projects.html'
    
    #Returns a promise
    .then (val) -> console.log val
    .done()

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

html-scraper

v0.1.3

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme