partsley
v3.0.1
Published
Parsing language and engine for the web
Downloads
2
Readme
partsley
- A tool for parsing the web
Usage
Install from npm/yarn
$ npm install partsley
Use a "parselet" as a recipe/filter to parse a website.
Parselets are just plain JS objects, so can be serialized using e.g. YAML or JSON. Examples here are shown in YAML for brevity.
Here is an example of a parselet for grabbing business data from a Yelp page:
name: h1
phone: .biz-phone
address: address
reviews(.review):
- date: meta[itemprop=datePublished] @content
name: .user-name a
comment: .review-content p
As a module
You can also use partsley as a module:
import { partsley } from 'parsz';
const opts = {};
const data = partsley(html, parselet, opts);
Tips
This is a very general purpose and flexible tool. But here are some tips for getting started.
Grabbing a list of data
Use a reference selector in the key and an Array as the value.
users(.user):
- name: .name
age: .age
Use transformation functions on data
Add a pipe (|) and the transformation name after the data selector.
user:
name: .name
age: .age|parseInt
worth: .age|parseFloat
someNumber: .age|Math.floor
By default functions in scope include any standard library functions. However, you're encouraged to bring your own functions into scope. You may consider e.g. curried libs like Ramda or Lodash FP, such as to expose transforms like toLower
and split(',')
:
import { partsley } from 'parsz';
import * as R from 'ramda';
const opts = {
transforms: R,
};
const data = partsley(html, parselet, opts);
Grabbing an attribute
Use a (@) symbol to reference an attribute.
user:
name: .name
nickname: .name@data-nickname
Have fun!