@openactive/harvesting-utils
v0.1.3
Published
Utils library for harvesting RPDE feeds
Downloads
108
Readme
harvesting-utils
Utils library for harvesting RPDE feeds.
Version 0.X.X
This library is currently in version 0.X.X, which means that the API will not be stable until 1.0.0.
Install
This library can be installed as an npm package using the following command:
$ npm install git://github.com/openactive/harvesting-utils.git
Usage
const { harvestRPDE } = require('@openactive/harvesting-utils')
harvestRPDE({
baseUrl: '...',
/* ...relevant parameters here */
});
Examples
A very simple example of harvestRPDE
can be found in examples/simple-rpde-harvester.js
. For more information on this script see here.
API Reference
harvestRPDE
Indefinitely harvests an RPDE feed, following the "expected consumer behaviour" described in the RPDE spec.
N.B. This function will run indefinitely, and only return if a fatal error occurs. For this reason, you will generally not want to run await harvestRPDE(..)
.
Required Parameters
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| baseUrl | string | Feed URL to harvest |
| feedContextIdentifier | string | Unique identifier for feed within the dataset eg ScheduledSession |
| headers | () => Promise<Object.<string,string>> | Function that returns headers needed to make a request to the feed URL |
| processPage | (rpdePage: any, feedIdentifier: string, isInitialHarvestComplete: () => boolean) => Promise | Function that processes items in each page of the feed |
| onFeedEnd | () => Promise | Function that is called when the last page of the feed is reached. This function may be called multiple times if new items are added after the first time harvestRPDE()
reaches the last page |
| onError | () => Promise | Function that is called if the harvest errors |
| isOrdersFeed | boolean | Is the feed an Orders feed? |
Optional Parameters
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| state | object | Existing state can be passed in and manipulated within harvestRPDE() |
| state.context | FeedContext | Context about the feed. Default: new FeedContext(feedContextIdentifier,baseUrl, multibar) |
| state.feedContextMap | Map<string, FeedContext> | Map containing FeedContexts about this and other feeds within the dataset. Default: new Map() |
| state.startTime | Date | Start time of the harvest. Default: new Date() |
| loggingFns | object | Logging functions for different cases |
| loggingFns.log | (message?: any, ...optionalParams: any[]) => void | Normal logging. Default: console.log |
| loggingFns.logError | (message?: any, ...optionalParams: any[]) => void | Error logging. Default: console.error |
| loggingFns.logErrorDuringHarvest | (message?: any, ...optionalParams: any[]) => void | Error logging during the harvest Default: console.error |
| config| object | Configuration options |
| config.howLongToSleepAtFeedEnd | () => number | How long to wait, in milliseconds, before re-polling a feed after fetching the last page (RPDE spec). Default: () => 500
|
| config.WAIT_FOR_HARVEST | boolean | Whether to wait for harvest to complete and run onFeedEnd() function. Default: true |
| config.VALIDATE_ONLY | boolean | TODO. Default: false |
| config.VERBOSE | boolean | Verbose logging. Default: false |
| config.ORDER_PROPOSALS_FEED_IDENTIFIER | string | TODO. Default: null |
| config.REQUEST_LOGGING_ENABLED | boolean | Extra logging around the request. Default: false |
| options | object | Optional features |
| options.multibar | import('cli-progress').MultiBar | If using cli-progress.Multibar, this can be supplied and harvesting updates will be provided to the multibar. Default: null |
| options.pauseResume | {waitIfPaused: () => Promise} | Function, if implemented, that can be used to pause harvesting. Default: null |
createFeedContext
Function that creates a FeedContext object
Required Parameters
| Parameter | Type | Description | | --------- | ---- | ----------- | | feedContextIdentifier | string | Unique identifier for feed within the dataset eg ScheduledSession | | baseUrl | string | Feed URL to harvest |
Optional Parameters
| Parameter | Type | Description | | --------- | ---- | ----------- | | multibar | import('cli-progress').MultiBar | If using cli-progress.Multibar, this can be supplied and context values will be provided to the multibar. Default: null |
progressFromContext
Function that returns harvesting progress values from a FeedContext object
Required Parameters
| Parameter | Type | Description | | --------- | ---- | ----------- | | context | FeedContext | FeedContext object to get progress values from |
harvestRpdeLossless
harvestRpdeLossless
has the same function signature as harvestRpde
. However it is capable of handling modified
values that are too large for JavaScript numbers to handle natively ie > 2^53. This is handled by storing them as strings in memory.
For more guidance on how to handle these values, see here.
Developing
TypeScript
The code is written in native JS, but uses TypeScript to check for type errors. TypeScript uses JSDoc annotations to determine types (See: Type Checking JavaScript Files) from our native .js files.
In order for these types to be used by other projects, they must be saved to TypeScript Declaration files. This is enabled by our tsconfig.json, which specifies that declaration files are to be generated and saved to built-types/
(As an aside, the reason that the package's types must be saved to .d.ts files is due to TypeScript not automatically using JS defined types from libraries. There is a good reason for this and proposals to allow it to work at least for certain packages. See some of the discussion here: https://github.com/microsoft/TypeScript/issues/33136).
For this reason, TypeScript types should be generated after code changes to make sure that consumers of this library can use the new types. The openactive-test-suite project does this automatically in its pre-commit hook, which calls npm run gen-types
TypeScript-related scripts:
check-types
: This uses thetsconfig.check.json
config, which does not emit any TS declaration files - all it does is check that there are no type errors. This is used for code tests.gen-types
: This uses thetsconfig.gen.json
config, which emits TS declaration files intobuilt-types/
.Additionally, it copies programmer-created
.d.ts
files from our source code (e.g.src/types/Criteria.d.ts
) intobuilt-types/
. This is because our code references these types, so they must be in thebuilt-types/
directory so that the relative paths match (e.g. so thatimport('../types/Criteria').Criteria
works).