@easyscrape/core
v0.0.2
Published
EasyScrape is a NodeJS module designed to be integrated into your web scraping project. With it, you can more easily get information from the web from a JSON object to organized data, as a REST API could give you!
Downloads
4
Maintainers
Readme
EasyScrapeCore
PROJECT IN DEVELOPMENT!
DO NOT USE IT IN PRODUCTION.
EasyScrape It's a mediator to make easier the Web Scraping with JavaScript and TypeScript. With EasyScrape you can extract all data from any website like an API. This is the core of the all Middlewares based on EasyScrape scraping method. Here you can read how to make your own Middleware based on EasyScrape. If you only want to use it, This isn't your Documentation, read the all EasyScrape Implementations on the next list based on your requirements.
EasyScrape For:
- Cheerio: It scrapes from HTML documents.
- Puppeteer: It scrapes or controls a navigator like Chromium or other browsers supported by Puppeteer (coming soon)
Documentation Links
- How can EasyScrape help you?
- Installation
- How could i use it?
- How can i create my own EasyScrape Implementation?
How can EasyScrape help you?
Well, EasyScape can Scrape and give you the information that you want exactly like you need.
Installation
Use this command to install EasyScrape's Module in your Project.
# if you use npm
npm install @easyscrape/core
# or yarn
yarn add @easyscrape/core
How could i use it?
Very Easy! Only imports the NodeJS Module that implements EasyScrapeCore for manage a your favorite Scraping Module in your code like this
// Example using EasyScrape for Cheerio
const EasyScrape = require('@easyscrape/cheerio');
Then, load your HTML Code with the Module that you want to use in your project. Supposing that you has an HTML code Like this
<nav id="ShoppingList">
<ul id="fruits">
<li class="apple">Apple</li>
<li class="orange">Orange</li>
<li class="pear">Pear</li>
</ul>
<ul id="meats">
<li class="pork">Pork meat</li>
<li class="beef">Beef</li>
<li class="chicken">Chicken</li>
</ul>
</nav>
If you are using Cheerio, you can do this.
let $ = EasyScrape.load('<nav id="ShoppingList">...</nav>');
let data = $({
fruits: {
_each_: '#fruits li', // Get all "li" elements inside the element with id "fruits" and for each elements do the next
_text: true // get the inner text
},
meats: {
_each_: '#meats li', // Get all "li" elements inside the element with id "meats" and for each elements do the next
_text: true // get the inner text
}
});
The variable "data" contains:
{
fruits: [
'Apple',
'Orange',
'Pear'
],
meats: [
'Pork meat',
'Beef',
'Chicken'
]
}
How can i create my own EasyScrape Middleware?
Its very easy create your own implementation, you can follow the next steps to do it.
Step 0: Preparations
THE DOCUMENTATION IS UNDER DEVELOPMENT RIGTH NOW!
- You remember the Documentation Web Site (coming soon) its your friend! If you don't know what is the use for some method or you need an example, there is the technical documentation. This is a Quickstart guide.
- If you use Visual Studio Code you can watch all technical documentation for each method only making mouseover the method name.
- You can read the JSDoc comments each methods or classes.
- Other form of help your self its reading the Cheerio Implementation.
Let's start!
Step 1: Installation
Install EasyScrapeCore in your Project.
Step 2: Main File
You Make a main file for your implementation, using the next structure.
// File: ./MyFirstMiddlewareEasyScrape.ts
import MyMiddlewareESQueriesManager from './MyMiddlewareESQueriesManager';
import {AbstractEasyScrapeMiddleware, IESObject, IESQuery} from '@easyscrape/core';
class MyFirstMiddlewareEasyScrape extends AbstractEasyScrapeMiddleware{
/**
* Middleware Information
*/
SupportFor = {
LibraryName: 'Cheerio', // Library name that your middleware use
PackageName: 'cheerio' // NPM Package name
};
/**
* Your Middleware Queries Manager
*/
protected QueriesManager: MyMiddlewareESQueriesManager = new MyMiddlewareESQueriesManager(this);
/**
* This method says to EasyScrape when it can manage the data
*/
canICollect($: any): boolean {
// Write here one code that return true or false if it can scrape over current node
}
/**
* Make your Middleware Load method
* This method its expected that return an function whit one parameter with the types accepted
*/
load($: any){
return (query: IESObject|IESQuery|string) => this.collect($, query);
}
}
// The next line is very important, because it solve the unnecessary creations of the same middleware and export the module.
export default new MyMiddlewareEasyScrape;
Step 3: Middleware's Queries Manager
The queries manager its an class that contains all instructions to manage all queries that the user can use, the interface "IESQueriesManager" give you the basis queries and its information, but you can create all you need follow the following requirements:
- All query names must use the prefix "_" at the beginning.
- Use "$" like a wildcard in the query name to allow that the user customize the query.
- You can use as many wildcards as you need.
For example: "_select$" handles statements such as "_selectFood", "_selectAllListsElements" or "_select".
// File: ./MyMiddlewareESQueriesManager.ts
import import {
AbstractESQueriesManager,
ESQueriesManagerUtils,
IESQueriesManager,
ESFilterHandle
} from '@easyscrape/core';
class MyMiddlewareESQueriesManager
extends AbstractESQueriesManager // define the default EasyScrape methods, you can override if you need.
implements IESQueriesManager // says you what method do you need to create
{
// Your methods here
}
export default MiddlewareESQueriesManager;
Step 4: Build and Share your Middleware
Export your package. Write this in the package.json. Please, name your package using @easyscrape/ followed of your middleware name, like this:
{
"name": "@easyscrape/mymiddleware",
"version": "1.0.0",
"main": "./MyMiddlewareESQueriesManager.js",
// ...
}
You remember add your Middleware name on the list of this repository so that everyone can use it and know it.