downode
v0.1.6
Published
One Rule to scrape them all.
Downloads
10
Maintainers
Readme
downode
downode is a easy-to-use scraper for general usage. Simple but powerful.
Installation
npm i -S downode
Features
Composable: downode supports nested Rule, you can reuse/compose your
Page Rule
/Rule
arbitrarily.Concurrent control: Control all the network requests with simple config option.
Reference mechanism: You can reference other scraped data easily and asynchronously.
Documentations
Examples
There is a example to scrape Douban Top Rated 250 Movies.
API
downode(entryURL, pageRule, globalOptions?)
scrape the given URL page with given Page Rule
NOTE: if you're using commonjs module, you'll need to use require('downode').default
to get this main function
Params
- string
entryURL
- The target URL you want to start with. - Object
pageRule
- ThePage Rule
for the entry page, a set ofRule
.Rule
(Object|String|RefVarWaiter) - Specify what/how to scrape. see Rule's Options Guide
- Object
globalOptions
- Global config options.totalConcurrent
(number? = 50) - Max concurrent number for global task prority queue. see Concurent Controlmode
('default' | 'df' | 'bf') - Global task prority queue mode. see Concurent ControlentryCookie
(string) -cookie
for entry request.rate
(number? = 0) - Defaultrate
option forRules
.concurrent
(number? = 5) - Defaultconcurrent
option forRules
.request
(Object? = 0) - Defaultrequest
option forRules
.userAgents
((string[] | string)? = MOST_COMMON_USER_AGENTS) - DefaultuserAgents
option forRules
.retry
(number? = 3) - Defaultretry
option forRules
.retryTimeout
(number? = 2000) - DefaultretryTimeout
option forRules
.
Return
- Promise - resolve a result Object with same structure to your Page Rule
waitFor(...refPaths, callback)
Function overloading:
waitFor(refPathsArray, callback)
waitFor(refPathsObject, callback)
Create a Reference Variable Waiter
. Invoke the callback when all Reference Variables
are available.
To learn more about reference mechanism, please head to reference-mechanism
Params
- string[]
refPaths
:Reference Paths
passed one by one.- or string[]
refPathsArray
: A array contains allReference Paths
- or object
refPathsObject
: A object contains key value map toReference Paths
- or string[]
- Function
callback
Return
- any - Return what callback return.
Debug
# set environment variable
export DEBUG=downode:*
# `downode:info` - basic infomation, like request, download.
# `downode:warn` - retry request, useless rule
# `downode:error` - error infomation, including request error, download error etc.
Related
downode is inspired by these projects:
Roadmap
- [ ] Proxy Rule Option
- [ ] Post Rule Option
- [ ] Authorization/Cookie propogation
- [ ] CLI support
- [ ] Incremental scrape
- [ ] Dynamic generate website scrape support
License
MIT