parsz
v2.0.4
Published
Parsing language and engine for the web
Downloads
15
Readme
pársz
- A tool for parsing the web
Usage
Install globally from npm/yarn
$ npm install -g parsz
View options from help menu
$ parsz --help
Use a "parselet" as a recipe/filter to parse a website.
The structure of the parselet is JSON.
Here is an example of a parselet for grabbing business data from a Yelp page:
{
"name": "h1|trim",
"phone": ".biz-phone|trim",
"address": "address|trim",
"reviews(.review)": [{
"date": "meta[itemprop=datePublished] @content",
"name": ".user-name a",
"comment": ".review-content p"
}]
}
As a module
You can also use parsz as a module:
import parsz from 'parsz';
parsz([Parselet JSON], [URL]).then(data => {
// Do something with the data
});
Tips
This is a very general purpose and flexible tool. But here are some tips for getting started.
Grabbing a list of data
Use a reference selector in the key and an Array as the value.
{
"users(.user)": [{
"name": ".name",
"age": ".age",
}]
}
Use transformation functions on data
Add a pipe (|) and the transformation name after the data selector.
{
"user": {
"name": ".name|trim",
"age": ".age|parseInt",
"worth": ".age|parseFloat",
"someNumber": ".age|floor",
}
}
If anyone would like to see a certain, helpful transformation function added, please just open a issue
Grabbing an attribute
Use a (@) symbol to reference an attribute.
{
"user": {
"name": ".name",
"nickname": ".name@data-nickname",
}
}
Grabbing remote data
Use a (~) and a link selector to reference external content. The mapping (value) will be relative to that new external scope.
{
"user": {
"name": ".name",
"company~(a.company)": {
"name": ".company-name",
"address": ".company-address",
},
}
}
Have fun!