event_scraper
v0.0.2
Published
Event based asynchronous proxied scraper.
Downloads
2
Readme
Event Scraper
Asynchronous scraping as fast as possible!
Installing:
npm install event_scraper
Starting example:
import { Event, EventListener, EventHandler, IContext } from 'event_scraper';
class AsyncScrapeEvent extends Event {
context: IContext = {
callerEvent: this,
proxied: false,
numberOfTry: 1
}
public async event(proxy?: string) {
// do something asynchronous...
return result;
}
}
const event = new AsyncScrapeEvent();
class MyEventListener extends EventListener {
eventToListenTo = AsyncScrapeEvent.name;
onSuccess(context: IContext, result: any): void {
// do something with result
// start new events as:
EventHandler.scheduleEvent(newEvent);
}
}
const eventListener = new MyEventListener();
// set up everything:
const eventHandler = new EventHandler(5);
const proxies = []; /** if you have proxies... */
eventHandler.setEventListener(eventListener);
eventHandler.setProxies(proxies);
eventHandler.startEvent(event);
Description
This module operates with a basic EventEmitter under the hood. The goal of the project is to create events, all of which can do anything asynchronous while the relationship between them are leveraged by the event-listeners we set. The addition which makes it more than just a regular event-emitter, is that you can feed in a given set of proxies, and every time an event requires a proxy, it will automatically assign one to it completely randomly. There are always as many proxied asynchronous events as many proxies you feed in.
The other extra is that you can set how many retries it should do. This way if any event fails, it'll automatically reschedule it.
Documentation
Event class
Every time you create a new event, you should extend the Event class. You'll be required to create a context parameter. There are three parameters required here. These are:
| Parameter name | Type | Description | | -------------- | ---- |:-----------:| | callerEvent | Event| You should just bind this for this parameter. It is used to re-schedule the same event | | proxied | boolean | Indicates whether the event should get a proxy or not | | numberOfTry | number | You should always set this to 1 |
You should also define the asynchronous event function, which can get an optional parameter, the proxy as a string if you set the proxied context parameter as true.
If you get a proxy it'll be a string in the following format:
http://proxy.ipAddress:proxy.port // http:/141.12.16.59:4400
You can put anything in the context parameter as you'd like, these can be used later at the event listeners, since the functions you can override there will all get you extended context. Therefore if you'd like to transfer data from the event to the appropriate event listener, you should put it into the context.
EventListener class
Every event listener you'd like to set up has to extend from the EventListener class. Here you have six methods to override. These are:
| Function name | Parameter types | Return types | Description | | ------------- | --------------- | ------------ |:-----------:| | onSuccess | context: IContext, result: any | void | This function is going to get executed when the appropriate event was successful. It'll also get the event's context and the result of the event | | onFailure | context: IContext, result: any | void | This function gets executed when the appropriate event was unsuccessful. The result will contain the error, therefore the reason of failure. | | onReschedule | context: IContext, result: any | void | This function gets executed on rescheduling the event. The result contains the error, therefore the reason of the reschedule | | logSuccess | context: IContext | string | The return string is going to be logged on the console on successful execution in green. | | logFailure | context: IContext | string | The return string is going to be logged on the console on unsuccessful execution in red. | | logReschedule | context: IContext | string | The return string is going to be logged on the console on rescheduling of the event. |
When setting up the EventListener class, you have to define a string property called eventToListenTo. This value of this should be the name property of the Event class to listen to. See example above.
If the logger functions don't get overridden, nothing will be logged.
EventHandler class
This is the interface through which you can set up the listeners, events and proxies as well. Also it gives you a handy function to schedule new events into the pool as well. When instantiated the parameter that it needs will be the total number of times it will try to do each event in case of failures.
| Function name | Parameter types | Return types | Description | | ------------- | --------------- | ------------ |:-----------:| | setEventListener | eventListener: EventListener | void | Use this function to set up an event listener. | | setEventListeners | eventListeners: EventListener[] | void | Use this function to set up multiple event listener at once. | | startEvent | event: Event | void | This function schedules one single event instantly. Use this to start the whole event-chain. | | startEvents | events: Event[] | void | This function schedules an array of events instantly. Use this to start the whole event-chain if there are multiple starter events | | setProxies | proxies: IProxies[] | void | Use this function to set up available proxies for the events. |
And there is an abstract method defined as well:
| Function name | Parameter types | Return types | Description | | ------------- | --------------- | ------------ |:-----------:| | scheduleEvent | event: Event | void |Use this function to schedule new events while the script is already running (like at the onSuccess method of a listener) |
IContext
This is the interface of an event's context. If you'd like to add anything to the context, you should extend this interface and after that everywhere where it requires the type IContext, you should just use your extended context.
TODO: Get rid of the numberOfTry... Testing, testing, testing...
License
MIT