s3-getobject-accelerator
v2.12.0
Published
Get large objects from S3 by using parallel byte-range fetches/parts to improve performance.
Downloads
67
Readme
S3 GetObject Accelerator
Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.
We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings:
{concurrency: 64}
.
Installation
npm i s3-getobject-accelerator
Examples
Compact
const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');
pipeline(
download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
createWriteStream('/tmp/test'),
(err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
}
);
More verbose
Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:
const {createWriteStream} = require('node:fs');
const {download} = require('s3-getobject-accelerator');
const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});
d.on('part:downloading', ({partNo}) => {
console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
console.log('part written to disk', partNo);
});
d.meta((err, metadata) => {
if (err) {
console.error('something went wrong', err);
} else {
if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
console.error('file is larger than 1 TiB');
} else {
d.file('/tmp/test', (err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
});
}
}
});
API
download(s3source, options)
s3source
<Object>
bucket
<string>
key
<string>
version
<string>
(optional)
options
<Object>
partSizeInMegabytes
<number>
(optional, defaults to uploaded part size)concurrency
<number>
connectionTimeoutInMilliseconds
<number>
(optional, defaults to 3000)v2AwsSdkCredentials
<AWS.Credentials>
(optional)endpointHostname
<string>
(optional, defaults to s3.${region}.amazonaws.com)agent
<https.Agent>
(optional)
- Returns:
meta(cb)
<Function>
Get meta-data before starting the download (downloads the first part and keeps the body in memory until download starts)cb(err, metadata)
<Function>
err
<Error>
metadata
<Object>
lengthInBytes
<number>
parts
<number>
Number of parts available (optional)
readStream()
<Function>
Start download- Returns: ReadStream
file(path, cb)
<Function>
Start downloadpath
<string>
cb(err)
<Function>
err
<Error>
abort([err])
<Function>
Abort downloaderr
<Error>
partsDownloading()
<Function>
Number of parts downloading at the moment- Returns
<number>
- Returns
addListener(eventName, listener)
See https://nodejs.org/api/events.html#emitteraddlistenereventname-listeneroff(eventName, listener)
See https://nodejs.org/api/events.html#emitteroffeventname-listeneron(eventName, listener)
See https://nodejs.org/api/events.html#emitteroneventname-listeneronce(eventName, listener)
See https://nodejs.org/api/events.html#emitteronceeventname-listenerremoveListener(eventName, listener)
See https://nodejs.org/api/events.html#emitterremovelistenereventname-listener
AWS credentials & region
AWS credentials are fetched in the following order:
options.v2AwsSdkCredentials
- Environment variables
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
(optional)
- IMDSv2
AWS region is fetched in the following order:
- Environment variable
AWS_REGION
- IMDSv2
Considerations
- Typical sizes
partSizeInMegabytes
are 8 MB or 16 MB. If objects are uploaded using a multipart upload, it’s a good practice to download them in the same part sizes ( do not specifypartSizeInMegabytes
), or at least aligned to part boundaries, for best performance (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html). - Keep in mind that you pay per GET request to Amazon S3. The smaller the parts, the more expensive a download is.