su-downloader3-enhance
v1.2.1
Published
HTTP segmented file downloader
Downloads
3
Maintainers
Readme
su-downloader3 - A node.js HTTP downloader
Table of Contents
Basic Usage
su-downloader3 has a very minimal API and is designed to simplify the task of downloading large files via HTTP.
const path = require('path')
const { startDownload } = require('su-downloader3')
var url = 'http://ftp.iinet.net.au/pub/test/5meg.test1'
var savePath = path.join(__dirname, '5meg.test1')
var locations = { url, savePath }
var options = {
threads: 3,
throttleRate: 100
}
startDownload(locations, options).subscribe({
next: progressInfo => console.log(progressInfo),
error: e => console.log(e),
complete: () => console.log('download has completed!')
})
Overview
su-downloader3 is a node.js downloader library that facilitates the downloading of files via http/s requests. It is based around the request package and has inbuilt support for pausing/resuming (even after the process has exited), downloading using multiple concurrent requests (faster download speeds), and monitoring download progress.
There are two parts to the library; the downloader and the scheduler.
The downloader
The downloader is the core of su-downloader3. It's API consists only of the startDownload
function which returns an observable. In order for the download to begin, this observable must be subscribed to. The meta data for the download and the download's progress info will be emitted by the observable.
The scheduler
The scheduler acts as an interface between the user and the downloader to manage multiple download tasks. It implements a FIFO queue data structure to allow the user to queue tasks, have them start automatically, and to limit the number of active downloads. The scheduler maintains a list of the download tasks, with each task being identified by a unique key provided when queueing the download. The scheduler also maintains a list containing the subscriptions for each active download. This removes the need for the user to subscribe to the observable returned by the downloader's startDownload
function. Instead, they provide an observer object to the scheduler when queueing a download, and the scheduler will subscribe to the observable when the download is started.
The scheduler can be configured to use a default download options object which applies to all downloads made through it. Options provided as a parameter will override the default options.
API
If you are using the scheduler, you should never need to call the startDownload
function. Instead, queue the download using the scheduler's queueDownload
instance method, and then the startDownload
instance method.
startDownload(locations, options) => Observable
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| locations | object
or string
| | If string, locations
should be either a download url or an existing .sud
path. |
| locations.url | string
| | url to download file from |
| locations.savePath | string
| | Path (including filename) to save the file to |
| locations.saveDir | string
| | Directory to save the file to |
| options | object
| | Download options |
| options.threads | integer
| 4
| The number of segments to break the file into
| options.timeout | integer
| 18000
| How long to wait without receiving data before aborting download, in milliseconds
| options.headers | object
| | Custom HTTP headers
| options.throttleRate | integer
| 500
| Minimum time interval between successive emissions of download progress info, in milliseconds |
Note: Only one of locations.savePath
and locations.saveDir
need to be defined. If both are defined, the file is saved to the path given by locations.savePath
and locations.saveDir
is ignored.
If no save path is provided, the file will be saved to the current working directory with a filename determined by the url. e.g. http://download.location/file.zip
will save the file to ./file.zip
.
If the save dir is provided, the file will be saved to that directory with the filename being determined as described above.
To pause a download started using the startDownload
function, the user must call the unsubscribe
instance method on the subscription made to the observable returned by the startDownload
function. See here for an example.
The returned observable emits the meta data of the download on its first emission. Subsequent emissions contain the download progress info which is an object that looks like:
{
time: {
start, //timestamp
elapsed, //milliseconds
eta //seconds
},
total: {
filesize, //bytes
downloaded, //bytes
percentage //real number between 0 and 100
},
instance: {
downloaded, //bytes
percentage //real number between 0 and 100
},
speed, //bytes per second
avgSpeed, //bytes per second
threadPositions //array of bytes
}
speed
is calculated by finding the change in downloaded
with respect to a small change in elapsed
. Because of this, if the throttleRate
is particularly low, speed
will likely fluctuate. instance
holds data pertaining to the particular session of the download. For example, if you start a download today and download 120MB/200MB, then resume the download tomorrow, by the time the download is finished, total.downloaded
will be 200MB, but instance.downloaded
will be 80MB.
SuDScheduler(schedulerOptions) => SuDScheduler
Instantiates a new SuDScheduler instance with the given options.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| schedulerOptions | object
| | Scheduler options |
| schedulerOptions.autoStart | boolean
| true
| Whether or not to automatically start downloading queued download tasks |
| schedulerOptions.maxConcurrentDownloads | integer
| 4
| Maximum number of downloads at any single time. Set this to 0
for unlimited concurrent downloads |
| schedulerOptions.downloadOptions | object
| | Default download options to be used for downloads if not provided (see startDownload
)
A SuDScheduler instance's options can be accessed and written to at any time by setting the properties of the SuDScheduler.option object.
SuDScheduler.queueDownload(key, locations, [options], userObserver) => object or true
Adds a new download task to the end of the queue.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| key | string
| | Unique identifier for each download task. Required |
| locations | object
| | Same as the locations
parameter for startDownload
|
| options | object
| | Same as the options
parameter for startDownload
|
| userObserver | object
| | Can be the 3rd or 4th positional argument. The userObserver
object must have a next
, error
and complete
fields which are functions. Required |
If the scheduler has autoStart
set to true, true
will be returned if a download task is successfully queued.
If the scheduler has autoStart
set to false, a convenience object containing a start
method will be returned so that the user may simply dot chain start()
or easily start the download whenever they need.
SuDScheduler.startDownload(key) => boolean
Starts a new download, or resumes an active download. Starting a new download task using this method ignores the maxConcurrentDownloads
limit.
Returns false if a download task with the provided key doesn't exist, or true otherwise.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| key | string
| | Unique identifier for an existing download task.
SuDScheduler.pauseDownload(key, stop) => boolean
Pauses or stops a download task. Returns false if a download task with the provided key doesn't exist, or if the download task is not currently downloading. Returns true otherwise.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| key | string
| | Unique identifier for a download task |
| stop | boolean
| false
| Whether to stop or just pause the download |
A paused download is considered active.
Thus, if the download is paused, i.e. stop
is false
, then new downloads will not be automatically started if the max concurrency limit is already reached.
If the download is stopped, i.e. stop
is true
, then the next queued download will start (given that autoStart
is set to true).
SuDScheduler.killDownload(key) => boolean
Stops a download (if active) and removes associated .sud
and .PARTIAL
files, OR, removes a queued download task from queue.
Returns false if a download task with the provided key doesn't exist, or true otherwise.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| key | string
| | Unique identifier for a download task |
Note: This method is synchronous.
SuDScheduler.startQueue() => undefined
Starts as many download tasks as possible, limited by the maxConcurrentDownloads option.
SuDScheduler.pauseAll(stop) => undefined
Pauses or stops all download tasks.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| stop | boolean
| false
| Whether to stop or just pause the download |
SuDScheduler.getStatus(key) => TaskQueueItem (object) or false
Returns the status of a download, or false if the download task with the provided key doesn't exist.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| key | string
| | Unique identifier for a download task |
Read only properties
SuDScheduler.taskQueue => array
The queue of download tasks.
SuDScheduler.queuedCount => integer
The number of queued tasks.
SuDScheduler.activeCount => integer
The number of active tasks.
SuDscheduler.stoppedCount => integer
The number of stopped tasks.
SuDScheduler.taskCount => integer
The total number of tasks.
The following utility functions are also exposed for convenience:
killFiles(sudPath) => boolean
Synchronously removes all .sud
and .PARTIAL
files associated with the sudPath
.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| sudPath | string
| | Path to an existing .sud
file |
If sudPath
does not exist or is invalid, false
is returned.
If the files are successfully deleted, true
is returned.
sudPath(savePath) => string
Returns the .sud
file associated with the given save path.
| Param | Type | Default | Description |
| --- | :---: | :---: | --- |
| savePath | string
| | Save path of a download |
This function simply appends .sud
to savePath
.
Design
su-downloader3 uses Rxjs to handle streams of data.
There are 4 stages involved with the download process. In each stage, the observable is transformed in some way.
Stage 1: A HEAD request is made to get the file size, which is used to create and write the meta data.
Stage 2: The GET requests are made based on the meta data.
Stage 3: The data from the GET requests are written to the .PARTIAL
files and the position of each thread is updated. The rebuilding of files is set up.
Stage 4: The download progress info is created based on the thread positions and meta data.
Stage 1
The observable returned by startDownload
is an observable chain that begins with a HEAD request to the url to get its file size.
The file size (and other values provided, such as the url, save path and threads) is then transformed to the meta object which is written to the .sud
file as a side effect. Since the meta object is emitted through the observable, it doesn't matter that the .sud
file is written as a side effect as opposed to within the observable chain since it won't be used; it's only purpose is for resuming previous downloads.
Stage 2
GET requests will then be made based on the meta data. In particular, the Range
header is set so that only a particular range of bytes is downloaded for each request. These ranges are calculated based on the file size and the number of threads being used. The meta data is mapped to an object holding an array of these request observables and the plain meta data object.
Stage 3
The request observables and meta data object are then mapped to an observable (using concatMap
) which emits the meta data on its first emission and then the thread positions of any single thread thereafter. Within this concatMap
, the data is written to disk. This has to be done within the observable chain as opposed to a side effect as delays in I/O can cause corrupted files or cause the rebuilding process to fail. For this same reason, the rebuilding process is also done within the observable chain, after all the request observables have finished (this is done by using concat
to concatenate the rebuild observable with a flattened requests higher order observable). Note that although the rebuilding process is set up during this stage, the observable that is concatenated to the flattened requests higher order observable that actually performs the rebuilding of files is only subscribed to once the requests higher order observable has finished.
Stage 4
The final step is to generate the download progress info based on the thread positions. The observable up to this point currently emits SINGLE thread positions on each emission, e.g. 1995135
, 4331582
, 1996443
... The goal is to track the positions of ALL threads after each emission. To do this, the meta data is used to calculate which thread a certain thread position belongs to (e.g. a thread position of 135 must belong to a thread with a range of 0 to 200), and the thread position for that thread is updated. These thread positions are stored as an array indexed the same as the ranges. The scan
operator is used to keep track of the download progress info.
The process for resuming a download from an existing .sud
file is similar. The only difference is in stage 1; instead of making a HEAD request and creating a meta data object, the meta data object is simply read from the file and that is what is used in subsequent stages.