oe-master-job-executor
v2.2.0
Published
This module is an app-list module for oe-Cloud based applications. It provides the ability to run a function once in a randomly selected master app-instance (from a cluster). If the master goes down, a new master is elected and the function run again
Downloads
9
Maintainers
Readme
Table of Contents
Need
In a clustered environment, a boot script would run on all instances of the application, as the cluster comes up. However, irrespective of the number of application-instances in the cluster, sometimes there is a need to execute an arbitrary function just once, on a single application instance, at the time of application-cluster startup; and later, if the application instance that executed the one-time function goes down for any reason, another running instance needs to take over and run the function.
An example would be the case of a function that sends "application is up" heartbeat event at regular intervals. Such a function, if started from a regular boot script, would start multiple timers sending extra/unwanted heartbeat events depending on the number of app-instances in the cluster. This calls for a module that can run the function only once on a single app-instance of the cluster. If the function-runner instance goes down, the function should be run again on another active app-instance.
Another example: A clustered application may be running 4 application instances, but a job scheduler function within this app may need to run once on boot, and only on a single app instance. The app instance which is elected to run the job-scheduler function is the job-scheduler-master instance. If the job-scheduler-master instance goes down, then another instance of the application should become the job-scheduler-master and run the job scheduler function.
Implementation
This module provides the infrastructure for catering to the above need. It is implemented as an app-list module for oe-Cloud based applications. It provides the ability to automatically elect one app-instance from among the cluster-instances and run a specified function once in the elected app-instance, which we call the <FUNCTION_>MASTER instance, for e.g., JOB_SCHEDULER_MASTER instance. If the master for a function goes down, a new master is elected for that function and the function is run again.
To achieve the aforementioned functionality, a database query/update-based locking is used, along with updates of a lock heartbeat timestamp at regular intervals by the master app-instance. All app-instances keep checking for missed master-heartbeats and are ready to take over as master by deleting the lock from the database and creating their own lock.
Setup
To get the oe-master-job-executor feature in the application, the oe-master-job-executor node module needs to be added as a package.json dependency in the application.
Also, the module needs be added to the server/app-list.json
file in the app.
For e.g.,
package.json (only part of the file is shown here, with relevant section in bold):
server/app-list.json (Relevant section in bold):
Usage
The oe-master-job-executor module can be used to start any number of masters, for performing various one-time-run functions/operations. Each such usage creates one master. Creation of a master involves providing a lockName for the master and a masterJob object which has the one-time-run function to execute, as shown below:
function start() {
// Start the master function
}
function stop() {
// Stop/cleanup the master function
}
var masterJobExecutor = require('oe-master-job-executor');
var options = {
lockName: 'JOB-SCHEDULER', // Mandatory
masterJob: { start: start, stop: stop }, // Mandatory
checkMasterInterval: 60000 // Optional. See 'Configuration' section below for details.
};
masterJobExecutor.startMaster(options);
The above code can run, say, from a boot script in all app-instances of an application cluster, however, the function start() will be called only once on one of the app-instances (the master). If the master instance dies, start() will be called again once on another instance that is elected as master.
The stop() function is called whenever the master is stopped either due to manual disablement via HTTP API call (see Control section below), or self-stoppage due to heartbeat timestamp not getting updated for any reason.
Configuration
The oe-master-job-executor module can be configured via -
- Default values in code (no configuration)
- server/config.json
- environment variables
- startMaster options (see Usage section above)
with the following priority: 4 > 3 > 2 > 1
Priority is applicable on a per-parameter basis.
The following are the configuration parameters:
An example of oe-master-job-executor configuration via server/config.json is shown below:
{
. . .
. . .
"masterJobExecutor": {
"initDelay": 5000,
"checkMasterInterval": 20000
"heartbeatInterval": 15000,
"maxHeartbeatRetryCount": 10
},
. . .
. . .
}
Control
The master can be stopped and started manually via HTTP API. Disabling the master causes the current master to call the stop() function of the masterJob and it will also prevent other app-instances from becoming a master.
The API to call for stopping/disabling the master is as follows:
POST /api/mastercontrols/disable
e.g., payload:
{
"lockName": "JOB-SCHEDULER",
"reason": "testing"
}
The API for restarting/enabling the master is as follows:
POST /api/mastercontrols/enable
e.g., payload:
{
"lockName": "JOB-SCHEDULER"
}
Upon restarting/re-enabling a master, one of the running app-instances will get elected as master and the masterJob's start() function is called once.