scraperscript
v1.0.2
Published
ScraperScript is a query language for Web Scraping
Downloads
8
Maintainers
Readme
ScraperScript
ScraperScript is a query language for Web Scraping
Installation
Module available through the npm registry. It can be installed using the npm
or yarn
command line tools.
# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript
Documentation
Use the command scraperscript myfile
or server
Example file.
@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string
This return an json:
"error": false,
"errorsMsg": [],
"names": [
{
"number": 0,
"text": "Tiago"
},
{
"number": 0,
"text": "James"
}
],
"hasTitle": true,
"title": "my string"
Syntax
Place the URL in the first line: @http://myurl.com
Other lines: - key: query :type
PS: Space is important.
Key
Name
Rules:
- Use at the beginning of the line
- Format
- key:
Example: - name:
Type
Return type
Rules:
- Use at the end of the line
- Format
:type
Types:
- array
- object
- boolean
- string
- number
Example: :string
Query
String
" my string "
NOTE: "my string"
is invalid
Comment
!! my comment in ScrapperScript
Elements
nameOfHtmlElementOne >> nameOfHtmlElementTwo
Map elements [String]
nameOfHtmlElementOne @> nameOfSubHtmlElement
Map elements [Array]
nameOfHtmlElementOne @> [nameOfSubHtmlElement]
Map elements [Object]
nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}
Addition
nameOfHtmlElementOne ++ nameOfHtmlElementTwo
Replace
nameOfHtmlElementOne -- nameOfHtmlElementTwo
Equal comparison or Different
nameOfHtmlElementOne == nameOfHtmlElementTwo
nameOfHtmlElementOne ~= nameOfHtmlElementTwo
OR
nameOfHtmlElementOne || nameOfHtmlElementTwo
Tests
To run the test suite, first install the dependencies, then run test
:
# NPM
npm test
# Or Using Yarn
yarn test
Dependencies
- axios: Promise based HTTP client for the browser and node.js
- cheerio: Tiny, fast, and elegant implementation of core jQuery designed specifically for the server
Dev Dependencies
- body-parser: Node.js body parsing middleware
- express: Fast, unopinionated, minimalist web framework
- mocha: simple, flexible, fun test framework
- xo: JavaScript happiness style linter ❤️
Contributors
Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.