@nqminds/nqm-databot-host
v0.3.31-alpha.1
Published
## install You can use local or global installation. If you intend to daemonize the host (e.g. using pm2 or forever) then a local install is advisable. Local installs also provide a more reliable update mechanism, and allow multiple versions of the host t
Downloads
13
Keywords
Readme
nqm-databot-host
install
You can use local or global installation. If you intend to daemonize the host (e.g. using pm2 or forever) then a local install is advisable. Local installs also provide a more reliable update mechanism, and allow multiple versions of the host to be run on a given machine.
For local install, create a TDX platform folder if you don't already have one:
mkdir /tdx-platform
cd /tdx-platform
npm init (use ENTER to accept all defaults)
Then install the latest version of the host:
npm install --save @nqminds/nqm-databot-host
Or install a given version:
npm install --save @nqminds/[email protected]
Global install:
npm install -g @nqminds/nqm-databot-host
configure
Edit (or copy) the config.json
file, found at e.g. /tdx-platform/node_modules/@nqminds/nqm-databot-host
.
Verify that tdxServer
property points to your TDX. If your TDX uses a non-standard naming convention,
you can also specify the individual service endpoints via commandServer
, databotServer
, queryServer
and sshServer
.
Enter your databot host id and secret (as created in the toolbox) into the credentials
property of the config file.
Other optional configuration properties:
- autoStart - see the
offline databots
section below. - databotStorePath - instructs the databot host where to store databot packages it runs. This is (along with
fileStorePath) is useful for sandboxing databots within the local file system. If this property is not specified
in the config file it defaults to the
node_modules/@nqminds/nqm-databot-store
path relative to the installed root of the host. Note this path must be an instance nqm-databot-store. - fileStorePath - instructs the databot host where to store temporary files created by databots. If not specified
it defaults to the
nqm-databot-file-store
path relative to the installed root of the host. - sshTunnelPort - optionally configure the host to establish an ssh tunnel to the TDX proxy service. Set this to
true
to use the default ssh server port, or override with an explicit port value, e.g.4199
- contact your tdx administrator for full details.
run
single instance mode
To run a single instance:
/tdx-platform/node_modules/.bin/nqm-databot-host --config ./config.json
master mode
You can also run the host in 'master' mode, specifying the number of worker instances to run using the poolSize
argument.
/tdx-platform/node_modules/.bin/nqm-databot-host --config ./config.json --master --poolSize 5
daemonize
The recommended process manager is pm2
.
pm2 start /tdx-platform/node_modules/.bin/nqm-databot-host -- --config ./config.json
You can name the daemonized instance using e.g.
pm2 start /tdx-platform/node_modules/.bin/nqm-databot-host --name worker-databot -- --config ./config.json
debug
The databot host supports two modes of debugging your databots.
The first involves running the databot with your databot host with "debugMode": true
in the config.json file. The databot is scheduled as usual
through the toolbox (or via the API) and the databot host will display a message and pause before running the databot.
At this point you can attach your debugger to the databot process and begin debugging.
The second mode involves running your databot in a local mode, i.e. not from a databot host, without making any modifications to your databot code.
debugging through databot host
This mode of debugging works out-of-the-box for Visual Studio Code, but should be trivial to support in other debuggers/IDEs.
Run the host using the --debugBreak
option.
/tdx-platform/node_modules/.bin/nqm-databot-host --config ./my-config.json --debugBreak
The databot host will start and enter the idle state waiting for a databot instance to be assigned. Using the toolbox, create a databot or select an existing databot, and make sure you grant permisson for your host to execute it. Then run the databot using the toolbox.
After a short delay you should see the host receive the databot instance run request, and it will then proceed to install the databot. Once installed, the host will print a message to the console similar to the following, and then pause:
nqm-databot-host:DatabotHost piping input to child: ... +6ms
nqm-databot-host:CHILD-DEBUG ****************************** +40ms
nqm-databot-host:CHILD-DEBUG * * +0ms
nqm-databot-host:CHILD-DEBUG * Debugger listening on 5858 * +0ms
nqm-databot-host:CHILD-DEBUG * * +0ms
nqm-databot-host:CHILD-DEBUG ****************************** +0ms
nqm-databot-host:CHILD-DEBUG Sat, 22 Oct 2016 18:49:16 GMT nqm-databot reading input +195ms
nqm-databot-host:CHILD-DEBUG Sat, 22 Oct 2016 18:49:16 GMT nqm-databot received input
The host will now wait until a debugger attaches to process 5858.
Visual Studio Code
Run an instance of Visual Code and choose the debug tab. Create a new 'Attach to process' configuration using
the launch configuration drop-down. Click the run button, or hit F5. The debugger should start and immediately
break at a debugger
statement right before the databot
entry point.
You can now step into your databot code.
debugging in local mode
To do this set the environment variable DATABOT_DEBUG=1
.
The context for your databot instance will be read from a file named debug-input.js
in the current working
directory. If this file does not exist, an empty context will be used.
examples
The following examples assume your databot source is in a folder /path/to/databot
with index.js
as the main script file.
The examples show running the databot using node
on the command line, but of course you would probably run it through
your IDE.
Running with no context
This scenario is probably not very useful as there is no way to send input to the databot.
/path/to/databot>DATABOT_DEBUG=1 node index.js
Running with context
Create a file in your databot folder called debug-input.js
and enter some context data. The primary type
of context data will be inputs
which is the dictionary of inputs that will be sent to the databot entry point.
/path/to/databot>nano debug-input.js
module.exports = {
definitionVersion: 1,
inputs: {
mode: "update-v1-4",
projectsDataset: "Z1bmaGQ-prj",
projectId: "SyIgZNDhM",
},
packageParams: {},
tdxServer: "http://tdx.nqm-1.com",
queryServer: "http://q.nqm-1.com",
commandServer: "http://cmd.nqm-1.com",
shareKeyId: "HJeo92d3hf",
shareKeySecret: "letmein",
};
Then run the databot main script:
/path/to/databot>DATABOT_DEBUG=1 node index.js
Other useful context properties are:
module.exports = {
definitionVersion: 1, // omit if you want to use legacy nqm-api-tdx
inputs: {}, // simulate inputs
packageParams: {}, // simulate package parameters
fileStorePath: "", // set the file store path
tdxHost: "https://tdx.nqminds.com", // used to initialise tdxApi
shareKeyId: "", // share key id for authentication
shareKeySecret: "", // share key secret
}
The fileStorePath
specifies the folder where databot output created via output.getFileStorePath
or output.generateFileStorePath
will be placed. By default this will be a folder named debug-output
in the working directory.
offline databots
It is possible to configure the host to start a databot on boot, even if the host is offline.
n.b. use of this feature is not recommended. It is intended for hosts that need to start a databot while offline. Do not use it for general purpose hosts as it breaks the intended databot architecture pattern. All standard hosts will continue to run a databot if the network connection is interrupted, and they will sync successfully on re-connection.
If you have a databot instance that needs to always be running, simply set the always running
flag when starting the
instance via the toolbox and the TDX will schedule it accordingly and make sure it is always running on any eligible
host.
To set up an offline databot requires:
1 - the databot package needs to be installed in the databot library of the host. The best way to do this is to schedule the databot to run on the host via the TDX when the host is online. The databot package will then be cached in the databot library. Alternatively it's possible to manually copy the package to the databot library, or copy the package from a databot library of another host that has already cached it.
2 - the instance definition needs to be specified in the databot host config file, under the autoStart
section. Again,
the best way to do this is to run the instance via the TDX and then copy the instance definition from the toolbox
input
modal (n.b. the instance definition must have a unique id
property).
Configuring autoStart
databots
Databots that should start on boot are configured in the autoStart
section of the host config file. This section
is an array of definitions of instances that should be started when the host starts. Note that a slave host is required
for each instance listed in the autoStart
section, for example if there are 3 instances defined then the host should
be started in master
mode with a poolSize
of at least 3.
Below is an example of a config file containing a single auto-start databot instance:
{
"tdxServer": "https://tdx.nq-m.com",
"credentials": "dkjfdJDK:letmein",
"debugMode": false,
"autoStart": [
{
"databotId": "rklWGtwsib",
"databotVersion": "5",
"id": "ryg47oR2hb",
"inputs": {
"message": "foobar!"
},
"name": "auto start example",
"shareKeyId": "LOjkdjiD",
"shareKeySecret": "letmein",
"schedule": {
"always": true,
"cron": ""
}
}
]
}
The example above shows a single databot configured to auto-start on the host. The host will start this databot
when it boots, and the schedule.always
property indicates that the databot should always be running. This means
that if the databot instance were to finish (without error), the host would start it again immediately. If you just
need the databot host to run once on boot set schedule.always
to false.
The shareKeyId
and shareKeySecret
must be valid credentials for a TDX share key. These credentials are used
to sync the instance status with the TDX once a network connection is made.
Databot library structure
The databot library is stored under the folder specified by the databotStorePath
property in the host config (see
above). For example, if your databotStorePath
is specified as /path/to/databotStore
then the databot library
folder will be created at /path/to/databotStore/databots
. Each databot that the host runs will
be cached in a sub-folder with a name taken from the databot id. Within each databot folder, a series of sub-folders
will be created matching the version number of the databot. For example, if a databot with id rklWGtwsib
and version
number 4
is run, the following illustrates the folder structure of the library:
/path/to/databotStore/databots
|
-- rklWGtwsib
|
-- 4
To manually install a databot package in the library, create the folder structure matching the databot id and version,
and then install the package in that folder. You may need to create or tweak the nqm.lib.json
file to reflect the
library path.
server databots
You can expose a web service from your databot and the TDX will set up a unique URL and proxy requests to your databot
server. To accomplish this, you should notify the host of the port your server is listening on using
the output.setProxyPort
method.
The following example demonstrates how to set up a basic nodejs server.
function databot(input, output, context) {
const http = require("http");
// Create the server.
const server = http.createServer((req, res) => {
// TODO - place your routing and responses here.
res.statusCode = 200;
res.setHeader("Content-Type", "text/html");
res.write("<html><body style=\"background-color: lime\"><div>hello world</div></body></html>");
});
// Use an input supplied-value with a fallback default.
let port = input.serverPort || 2323;
// Get notification that the server is listening successfully.
server.on("listening", () => {
output.debug("setting proxy port to %d", server.address().port);
output.setProxyPort(server.address().port);
});
// Intercept server errors and try a different port if it is already in use.
server.on("error", (err) => {
if (err.code === "EADDRINUSE") {
server.close();
// Increment the port number and try again.
setTimeout(() => {
port++;
server.listen(port);
}, 0);
} else {
output.abort("failed to start server [%s]", err.message);
}
});
// Start listening
server.listen(port);
}
host to TDX protocol
The databot host communicates with the TDX via the standard client API. The api must be authenticated using a TDX
databot host ID and secret, which is usually specified in the credentials
property of the configuration file.
In the current implementation, the databot host effectively pulls commands from the TDX rather than the TDX pushing commands to the host. There are several reasons for this approach, one of which is the plan to implement a browser version of the host. The command routing is implemented by the TDX response to the status update command (see below).
On startup a databot host must register with the TDX. This notifies the TDX where the host is running and that it is eligible to receive commands.
Once registered, the databot host periodically sends the TDX status information via the updateDatabotHostStatus API. The TDX will respond to this status update with any commands that are pending for the host. This is how the TDX to databot host communication is achieved.
The status update
interval is configurable via the idleTickInterval
and runningTickInterval
configuration options. This allows the
host to send updates more frequently when it is running a databot, and revert to a less frequent update when idle (or vice versa). The default idle tick interval is 15 seconds, the default running tick interval is 5 seconds.
host registration
Enables a host to register with the TDX, making it eligible to receive requests to run a databot instance.
This is available on the registerDatabotHost api.
The raw HTTP endpoint is a POST method to:
https://databot.acmeTDX.com/host/register
host status update
Used by a host to notify the TDX of status. This also serves as the host command router, in that the response from the TDX is passed to the command processor which will action any commands accordingly (see TDX command format below).
This is available on the updateDatabotHostStatus api.
The raw HTTP endpoint is a POST method to:
https://databot.acmeTDX.com/host/status
write instance output
This enables databot hosts to notify the TDX of databot output. When a databot writes output it is cached by the host and sent to the TDX when the databot completes.
This is available via the writeDatabotHostInstanceOutput api.
The raw HTTP endpoint is a POST method to:
https://databot.acmeTDX.com/host/output
TDX command format
The databot host implements a simple command processor. This can support any transport, currently it is invoked via the response to a host status update.
The generic format of the command object is shown below.
{
commandId: {string} - a unique id for this command
command: {string} - the command name
payload: {object} - the command payload
}
There are currently 4 supported commands, runInstance
, stopInstance
, stopHost
, updateHost
.
run instance command
This command is sent by the TDX to a databot host as a response to an idle
status update. The format of the
message is shown below.
{
commandId: "KD9dk-dZ", // Unique id of the command
command: "runInstance", // The command name.
payload: { // The command payload.
databotInstance: { // Details about the instance to run.
id: "iOF98d-", // The id of this databot instance.
inputs: { // Any inputs specified when the instance was started.
someInputParameter: 343,
anotherInputParameter: {foo: "bar"}
},
chunks: 4, // The number of 'chunks' to run for this instance.
name: "my-app", // The name given to the instance when it was started.
shareKeyId: "accessGEO", // The id of a share key that the instance can use.
shareKeySecret: "letmein", // The password for the share key.
authTokenTTL: 3600, // The TTL for the generated share key token.
databotId: "IdkE83-", // The id of the databot definition.
databotVersion: "0.3.1", // The version of the databot definition.
debugMode: false, // Flag indicating debug mode.
},
instanceProcess: { // Details about the chunk (process) to run.
id: "KLKidII", // The unique process id.
chunk: 1 // The chunk number to run.
}
}
}
When a databot instance is started by the end-user they may indicate that it should be distributed
across databot hosts, i.e. the processing should be split into 'chunks'. In this case the command.payload.databotInstance.chunks
property will contain the total number of chunks specified
and the command.payload.instanceProcess.chunk
will indicate the chunk number that this host should run.
How this distribution information is interpreted is down to the databot itself.
stop instance command
This command can be sent by the TDX to a databot host in response to a busy
status update, informing
the host that it should terminate the instance it is currently running.
{
commandId: "RIkd34pz",
command: "stopInstance",
payload: {
mode: "pause" | "resume" | "stop"
}
}
stop host command
This command can be sent by the TDX to a databot host as a response to any status update, instructing the host to exit.
{
commandId: "RIkd34pz",
command: "stopInstance",
payload: {
mode: "stop"
}
}
update host command
This command is sent by the TDX in response to any status update if the host software version is out of date with respect to that expected by the TDX.
{
commandId: "RIkd34pz",
command: "updateHost",
}