s3-upload-resume-v3
v0.0.1
Published
Amazon S3 Client with multipart resume for AWS-SDK V3
Downloads
2
Maintainers
Readme
High Level Amazon S3 Client
Installation
npm install s3-upload-resume-v3 --save
Features
- Automatically retry a configurable number of times when S3 returns an error.
- Includes logic to make multiple requests when there is a 1000 object limit.
- Ability to set a limit on the maximum parallelization of S3 requests. Retries get pushed to the end of the parallelization queue.
- Ability to sync a dir to and from S3.
- Progress reporting.
- Supports files of any size (up to S3's maximum 5 TB object size limit).
- Uploads large files quickly using parallel multipart uploads.
- Checks to see if it can resume an unfinished multipart upload.
- Checks the MD5 of each part of an unfinished multipart upload to see if it can skip the upload of that part, reuploads part if MD5 does not match.
- Uses heuristics to compute multipart ETags client-side to avoid uploading or downloading files unnecessarily.
- Automatically provide Content-Type for uploads based on file extension.
- Support third-party S3-compatible platform services like Ceph
Synopsis
Create a client
const { createClient } = require('s3-upload-resume-v3');
const client = createClient({
maxAsyncS3: 20, // this is the default
s3RetryCount: 3, // this is the default
s3RetryDelay: 1000, // this is the default
multipartUploadThreshold: 20971520, // this is the default (20 MB)
multipartUploadSize: 15728640, // this is the default (15 MB)
s3Options: {
accessKeyId: 'your s3 key',
secretAccessKey: 'your s3 secret',
region: 'your region'
// endpoint: 's3.yourdomain.com',
// sslEnabled: false
// any other options are passed to new AWS.S3()
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
}
});
Create a client from existing AWS.S3 object
const { S3Client } = require('aws-sdk/client-s3');
const { ConfigServiceClient } = require('aws-sdk/client-config-service')
const awsS3Client = new S3Client(s3Options);
const { createClient } = require('s3-upload-resume-v3');
const options = {
maxAsync: 20,
s3RetryCount: 3,
s3RetryDelay: 1000,
multipartUploadThreshold: 10 * 1024 * 1024, // 10MB
multipartUploadSize: 5 * 1024 * 1024, // 5MB
s3Client: awsS3Client
// more options available. See API docs below.
};
const client = createClient(options);
Upload a file to S3
const params = {
localFile: 'some/local/file',
s3Params: {
Bucket: 's3 bucket name',
Key: 'some/remote/file'
// other options supported by putObject, except Body and ContentLength.
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
}
};
let uploader = client.uploadFile(params);
uploader.on('error', function(err) {
console.error('unable to upload:', err.stack);
});
uploader.on('progress', function() {
console.log('progress', uploader.progressMd5Amount, uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
console.log('done uploading');
});
uploader.on('uploading', function(data) {
switch (data) {
case 'putting':
console.log(`Putting ${uploader.localFile} to Bucket: '${uploader.s3Bucket}', Key: '${uploader.s3Key}'`);
break;
case 'starting':
console.log(`Starting new multipart upload from scratch for ${uploader.localFile} to Bucket: '${uploader.s3Bucket}', Key: '${uploader.s3Key}'`);
break;
case 'resuming':
console.log(`Resuming a multipart upload for ${uploader.localFile} to Bucket: '${uploader.s3Bucket}', Key: '${uploader.s3Key}'`);
break;
default:
break;
}
});
Download a file from S3
const params = {
localFile: 'some/local/file',
s3Params: {
Bucket: 's3 bucket name',
Key: 'some/remote/file'
// other options supported by getObject
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
}
};
let downloader = client.downloadFile(params);
downloader.on('error', function(err) {
console.error('unable to download:', err.stack);
});
downloader.on('progress', function() {
console.log('progress', downloader.progressAmount, downloader.progressTotal);
});
downloader.on('end', function() {
console.log('done downloading');
});
Sync a directory to S3
const params = {
localDir: 'some/local/dir',
deleteRemoved: true, // default false, whether to remove s3 objects
// that have no corresponding local file.
s3Params: {
Bucket: 's3 bucket name',
Prefix: 'some/remote/dir/'
// other options supported by putObject, except Body and ContentLength.
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
}
};
let uploader = client.uploadDir(params);
uploader.on('error', function(err) {
console.error('unable to sync:', err.stack);
});
uploader.on('progress', function() {
console.log('progress', uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
console.log('done uploading');
});
Tips
Consider increasing the socket pool size in the
http
andhttps
global agents. This will improve bandwidth when usinguploadDir
anddownloadDir
functions. For example:http.globalAgent.maxSockets = https.globalAgent.maxSockets = 20;
API Documentation
s3.AWS
This contains a reference to the aws-sdk module. It is a valid use case to use both this module and the lower level aws-sdk module in tandem.
s3.createClient(options)
Creates an S3 client.
options
:
s3Client
- optional, an instance ofAWS.S3
. Leave blank if you provides3Options
.s3Options
- optional. leave blank if you provides3Client
.- See AWS SDK documentation for available options which are passed to
new AWS.S3()
: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
- See AWS SDK documentation for available options which are passed to
maxAsyncS3
- maximum number of simultaneous requests this client will ever have open to S3. defaults to20
.s3RetryCount
- how many times to try an S3 operation before giving up. Default 3.s3RetryDelay
- how many milliseconds to wait before retrying an S3 operation. Default 1000.multipartUploadThreshold
- if a file is this many bytes or greater, it will be uploaded via a multipart request. Default is 20MB. Minimum is 5MB. Maximum is 5GB.multipartUploadSize
- when uploading via multipart, this is the part size. The minimum size is 5MB. The maximum size is 5GB. Default is 15MB. Note that S3 has a maximum of 10000 parts for a multipart upload, so if this value is too small, it will be ignored in favor of the minimum necessary value required to upload the file.
s3.getPublicUrl(bucket, key, [bucketLocation])
bucket
S3 bucketkey
S3 keybucketLocation
string, one of these:- "" (default) - US Standard
- "eu-west-1"
- "us-west-1"
- "us-west-2"
- "ap-southeast-1"
- "ap-southeast-2"
- "ap-northeast-1"
- "sa-east-1"
You can find out your bucket location programatically by using this API: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getBucketLocation-property
returns a string which looks like this:
https://s3.amazonaws.com/bucket/key
or maybe this if you are not in US Standard:
https://s3-eu-west-1.amazonaws.com/bucket/key
s3.getPublicUrlHttp(bucket, key)
bucket
S3 Bucketkey
S3 Key
Works for any region, and returns a string which looks like this:
http://bucket.s3.amazonaws.com/key
client.uploadFile(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
params
:
s3Params
: params to pass to AWS SDKputObject
.localFile
: path to the file on disk you want to upload to S3.- (optional)
defaultContentType
: Unless you explicitly set theContentType
parameter ins3Params
, it will be automatically set for you based on the file extension oflocalFile
. If the extension is unrecognized,defaultContentType
will be used instead. Defaults toapplication/octet-stream
.
The difference between using AWS SDK putObject
and this one:
- This works with files, not streams or buffers.
- If the reported MD5 upon upload completion does not match, it retries.
- If the file size is large enough, uses multipart upload to upload parts in parallel.
- Retry based on the client's retry settings.
- Progress reporting.
- Sets the
ContentType
based on file extension if you do not provide it.
Returns an EventEmitter
with these properties:
progressMd5Amount
progressAmount
progressTotal
And these events:
'error' (err)
'end' (data)
- emitted when the file is uploaded successfullydata
is the same object that you get fromputObject
in AWS SDK
'progress'
- emitted whenprogressMd5Amount
,progressAmount
, andprogressTotal
properties change. Note that it is possible for progress to go backwards when an upload fails and must be retried.'fileOpened' (fdSlicer)
- emitted whenlocalFile
has been opened. The file is opened with the fd-slicer module because we might need to read from multiple locations in the file at the same time.fdSlicer
is an object for which you can callcreateReadStream(options)
. See the fd-slicer README for more information.'fileClosed'
- emitted whenlocalFile
has been closed.'uploading'
- emitted to tell how it is uploading the file.putting
is when it is uploading a file less than multipartUploadThreshold.starting
is when it is starting a multipart upload from scratch.resuming
is when it found an already started multipart upload on the s3 for that same bucket and key, resumes that upload skipping any parts that the md5 of that part up in the cloud matches the md5 of that part to be uploaded.
And these methods:
abort()
- call this to stop the find operation.
client.downloadFile(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
params
:
localFile
- the destination path on disk to write the s3 object intos3Params
: params to pass to AWS SDKgetObject
.
The difference between using AWS SDK getObject
and this one:
- This works with a destination file, not a stream or a buffer.
- If the reported MD5 upon download completion does not match, it retries.
- Retry based on the client's retry settings.
- Progress reporting.
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
And these events:
'error' (err)
'end'
- emitted when the file is downloaded successfully'progress'
- emitted whenprogressAmount
andprogressTotal
properties change.
client.downloadBuffer(s3Params)
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
s3Params
: params to pass to AWS SDKgetObject
.
The difference between using AWS SDK getObject
and this one:
- This works with a buffer only.
- If the reported MD5 upon download completion does not match, it retries.
- Retry based on the client's retry settings.
- Progress reporting.
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
And these events:
'error' (err)
'end' (buffer)
- emitted when the file is downloaded successfully.buffer
is aBuffer
containing the object data.'progress'
- emitted whenprogressAmount
andprogressTotal
properties change.
client.downloadStream(s3Params)
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
s3Params
: params to pass to AWS SDKgetObject
.
The difference between using AWS SDK getObject
and this one:
- This works with a stream only.
If you want retries, progress, or MD5 checking, you must code it yourself.
Returns a ReadableStream
with these additional events:
'httpHeaders' (statusCode, headers)
- contains the HTTP response headers and status code.
client.listObjects(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property
params
:
s3Params
- params to pass to AWS SDKlistObjects
.- (optional)
recursive
-true
orfalse
whether or not you want to recurse into directories. Defaultfalse
.
Note that if you set Delimiter
in s3Params
then you will get a list of
objects and folders in the directory you specify. You probably do not want to
set recursive
to true
at the same time as specifying a Delimiter
because
this will cause a request per directory. If you want all objects that share a
prefix, leave the Delimiter
option null
or undefined
.
Be sure that s3Params.Prefix
ends with a trailing slash (/
) unless you
are requesting the top-level listing, in which case s3Params.Prefix
should
be empty string.
The difference between using AWS SDK listObjects
and this one:
- Retries based on the client's retry settings.
- Supports recursive directory listing.
- Makes multiple requests if the number of objects to list is greater than 1000.
Returns an EventEmitter
with these properties:
progressAmount
objectsFound
dirsFound
And these events:
'error' (err)
'end'
- emitted when done listing and no more 'data' events will be emitted.'data' (data)
- emitted when a batch of objects are found. This is the same as thedata
object in AWS SDK.'progress'
- emitted whenprogressAmount
,objectsFound
, anddirsFound
properties change.
And these methods:
abort()
- call this to stop the find operation.
client.deleteObjects(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#deleteObjects-property
s3Params
are the same.
The difference between using AWS SDK deleteObjects
and this one:
- Retry based on the client's retry settings.
- Make multiple requests if the number of objects you want to delete is greater than 1000.
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
And these events:
'error' (err)
'end'
- emitted when all objects are deleted.'progress'
- emitted when theprogressAmount
orprogressTotal
properties change.'data' (data)
- emitted when a request completes. There may be more.
client.uploadDir(params)
Syncs an entire directory to S3.
params
:
localDir
- source path on local file system to sync to S3s3Params
Prefix
(required)Bucket
(required)
- (optional)
deleteRemoved
- delete s3 objects with no corresponding local file. default false - (optional)
getS3Params
- function which will be called for every file that needs to be uploaded. You can use this to skip some files. See below. - (optional)
defaultContentType
: Unless you explicitly set theContentType
parameter ins3Params
, it will be automatically set for you based on the file extension oflocalFile
. If the extension is unrecognized,defaultContentType
will be used instead. Defaults toapplication/octet-stream
. - (optional)
followSymlinks
- Set this tofalse
to ignore symlinks. Defaults totrue
.
function getS3Params(localFile, stat, callback) {
// call callback like this:
const err = new Error(...); // only if there is an error
const s3Params = { // if there is no error
ContentType: getMimeType(localFile), // just an example
};
// pass `null` for `s3Params` if you want to skip uploading this file.
callback(err, s3Params);
}
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
progressMd5Amount
progressMd5Total
deleteAmount
deleteTotal
filesFound
objectsFound
doneFindingFiles
doneFindingObjects
doneMd5
And these events:
'error' (err)
'end'
- emitted when all files are uploaded'progress'
- emitted when any of the above progress properties change.'fileUploadStart' (localFilePath, s3Key)
- emitted when a file begins uploading.'fileUploadEnd' (localFilePath, s3Key)
- emitted when a file successfully finishes uploading.
uploadDir
works like this:
- Start listing all S3 objects for the target
Prefix
. S3 guarantees returned objects to be in sorted order. - Meanwhile, recursively find all files in
localDir
. - Once all local files are found, we sort them (the same way that S3 sorts).
- Next we iterate over the sorted local file list one at a time, computing MD5 sums.
- Now S3 object listing and MD5 sum computing are happening in parallel. As
each operation progresses we compare both sorted lists side-by-side,
iterating over them one at a time, uploading files whose MD5 sums don't
match the remote object (or the remote object is missing), and, if
deleteRemoved
is set, deleting remote objects whose corresponding local files are missing.
client.downloadDir(params)
Syncs an entire directory from S3.
params
:
localDir
- destination directory on local file system to sync tos3Params
Prefix
(required)Bucket
(required)
- (optional)
deleteRemoved
- delete local files with no corresponding s3 object. defaultfalse
- (optional)
getS3Params
- function which will be called for every object that needs to be downloaded. You can use this to skip downloading some objects. See below. - (optional)
followSymlinks
- Set this tofalse
to ignore symlinks. Defaults totrue
.
function getS3Params(localFile, s3Object, callback) {
// localFile is the destination path where the object will be written to
// s3Object is same as one element in the `Contents` array from here:
// http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property
// call callback like this:
const err = new Error(...); // only if there is an error
const s3Params = { // if there is no error
VersionId: "abcd", // just an example
};
// pass `null` for `s3Params` if you want to skip downloading this object.
callback(err, s3Params);
}
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
progressMd5Amount
progressMd5Total
deleteAmount
deleteTotal
filesFound
objectsFound
doneFindingFiles
doneFindingObjects
doneMd5
And these events:
'error' (err)
'end'
- emitted when all files are downloaded'progress'
- emitted when any of the progress properties above change'fileDownloadStart' (localFilePath, s3Key)
- emitted when a file begins downloading.'fileDownloadEnd' (localFilePath, s3Key)
- emitted when a file successfully finishes downloading.
downloadDir
works like this:
- Start listing all S3 objects for the target
Prefix
. S3 guarantees returned objects to be in sorted order. - Meanwhile, recursively find all files in
localDir
. - Once all local files are found, we sort them (the same way that S3 sorts).
- Next we iterate over the sorted local file list one at a time, computing MD5 sums.
- Now S3 object listing and MD5 sum computing are happening in parallel. As
each operation progresses we compare both sorted lists side-by-side,
iterating over them one at a time, downloading objects whose MD5 sums don't
match the local file (or the local file is missing), and, if
deleteRemoved
is set, deleting local files whose corresponding objects are missing.
client.deleteDir(s3Params)
Deletes an entire directory on S3.
s3Params
:
Bucket
Prefix
- (optional)
MFA
Returns an EventEmitter
with these properties:
progressAmount
progressTotal
And these events:
'error' (err)
'end'
- emitted when all objects are deleted.'progress'
- emitted when theprogressAmount
orprogressTotal
properties change.
deleteDir
works like this:
- Start listing all objects in a bucket recursively. S3 returns 1000 objects per response.
- For each response that comes back with a list of objects in the bucket, immediately send a delete request for all of them.
client.copyObject(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property
s3Params
are the same. Don't forget that CopySource
must contain the
source bucket name as well as the source key name.
The difference between using AWS SDK copyObject
and this one:
- Retry based on the client's retry settings.
Returns an EventEmitter
with these events:
'error' (err)
'end' (data)
client.moveObject(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property
s3Params
are the same. Don't forget that CopySource
must contain the
source bucket name as well as the source key name.
Under the hood, this uses copyObject
and then deleteObjects
only if the
copy succeeded.
Returns an EventEmitter
with these events:
'error' (err)
'copySuccess' (data)
'end' (data)
Examples
Check if a file exists in S3
Using the AWS SDK, you can send a HEAD request, which will tell you if a file exists at Key
.
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#headObject-property
const client = require('s3').createClient({
/* options */
});
client.s3.send(new HeadObjectCommand(
{
Bucket: 's3 bucket name',
Key: 'some/remote/file'
}),
function(err, data) {
if (err) {
// file does not exist (err.statusCode == 404)
return;
}
// file exists
}
);
Testing
AWS_ACCESS_KEY_ID=<valid_AWS_ACCESS_KEY_ID> AWS_SECRET_ACCESS_KEY=<valid_AWS_SECRET_ACCESS_KEY> S3_BUCKET=<valid_s3_bucket> npm run test
Tests upload and download large amounts of data to and from S3. The test timeout is set to 40 seconds.
[email protected] test mocha
MultipartETag
√ returns unmodified digest
s3
√ get public URL
√ uploads (402ms)
√ downloads (316ms)
√ downloadBuffer (250ms)
√ downloadStream (286ms)
√ lists objects (237ms)
√ copies an object (217ms)
√ moves an object (398ms)
√ deletes an object (254ms)
√ uploads a folder (488ms)
√ downloads a folder (456ms)
√ uploadDir with deleteRemoved (352ms)
√ lists objects (272ms)
√ downloadDir with deleteRemoved (258ms)
√ upload folder with delete removed handles updates correctly (888ms)
√ uploads folder with lots of files (934ms)
√ multipart upload (4556ms)
√ download file with multipart etag (795ms)
√ deletes a folder (567ms)
20 passing (12s)