@c-collamar/datasync-s3-transfer
v2.0.5
Published
Automate S3 object transfers between buckets via DataSync
Downloads
8
Readme
DataSync S3 Transfer
Automate S3 object transfers between buckets via DataSync!
Use Case
- Transfer S3 objects between AWS S3 buckets via DataSync, without having to click around the AWS Console.
- The necessary DataSync resources are generated for you, so you you don't have to.
- S3 transfer can be between buckets in the same AWS accounts, or across different accounts.
- for cross-account transfers, you can specify whether the source or destination AWS account does the transfer. (See issue #1.)
- Create automation scripts using this tool, to initiate batch S3 object transfers between many buckets.
Usage
Install this package.
npm install @c-collamar/datasync-s3-transfer
Prepare inputs.
// source aws profile to the aws account where your source bucket is in const srcAwsProfile = { region: 'ap-northeast-1', credentials: { accessKeyId: 'ABCDEFGHIJKLMNOPQRST', secretAccessKey: 'WhHGQwvmvDaTne9LnMHV72A4cUkPkZWv2q6ieFtX' } }; // if source and destination buckets belong to the same aws account, then... const destAwsProfile = srcAwsProfile; const dataSyncOptions = { /** * Whether it is the "source" or "destination" AWS account that does the * copying of S3 objects, i.e., the initiating account. For cross-account * transfers, this dictates which AWS account the necessary DataSync * resources will be created under. But for same-account transfers, then * setting we want is... */ initiatingAccount: 'source', // source or destination aws account id, whichever is the initiating account dataSyncPrincipal: '123456789012', /** * Exisging IAM role from the initiating AWS account, with access to the * source and destination buckets. (See notes on permissions.) */ dataSyncRole: 'arn:aws:iam::123456789012:role/MyExistingRole', // existing cloudwatch log group under the initiating account, where datasync will record logs into cloudWatchLogGroup: 'arn:aws:logs:ap-northeast-1:123456789012:log-group:/aws/datasync:*' };
Notes:
- If you want to do cross-account S3 object transfers, i.e., where the destination bucket is owned by a different AWS account than the source bucket, then set a different profile for
destAwsProfile
similar tosrcAwsProfile
. - the IAM user behind the AWS config provided in
srcAwsProfile
ordestAwsProfile
, depending ifinitiatingAccount
equalssource
ordestination
respectively, must have DataSync-related permissoins set up first. See requirements on IAM User Permissions. - The IAM role specified in
dataSyncoptions.dataSyncRole
must have the necessary permissions for DataSync to assume. See IAM Role Permissions on how the role must be setup.
- If you want to do cross-account S3 object transfers, i.e., where the destination bucket is owned by a different AWS account than the source bucket, then set a different profile for
Start the transfer. That's it!
import { initDataSyncS3Transfer } from "@c-collamar/datasync-s3-transfer"; // configure transfer settings const transfer = initDataSyncS3Transfer(srcAwsConfig, destAwsConfig, dataSyncOptions); // start the transfer await transfer( 'source-bucket', // existing source bucket name 'destination-bucket', // existing destination bucket name 'Transfer to my other bucket' // the name of this transfer );
After the
transfer()
call is made, source S3 objects will be copied to the destination bucket the after some time. See under the hood on How A Transfer Is Made.Multiple transfers? Yes we can!
// initialize once const transfer = initDataSyncS3Transfer(srcAwsConfig, destAwsConfig, dataSyncOptions); const toTransfer = [ { from: 'source-bucket-1', to: 'destination-bucket-1', name: 'Transfer task 1' }, { from: 'source-bucket-2', to: 'destination-bucket-2', name: 'Transfer task 2' }, // ...etc ]; // transfer multiple times for (const item of toTransfer) { await transfer(item.from, item.to, item.name); } // or if you want to initiate the transfers concurrently... await Promise.all( toTransfer.map((item) => transfer(item.from, item.to, item.name)) );
How to properly handle errors? See Error Handling and Retries for details.
How a Transfer is Made
Everytime a transfer call is made, the following happens under the hood within the initiating AWS account:
A DataSync source S3 location is created, poining to the source S3 bucket.
If the initiating AWS account does not own the source bucket, then the source bucket policy is first updated in order to permit the initiating account to create the DataSync location.
A DataSync destination S3 location is created, pointing to the destination S3 bucket.
If the initiating AWS account does not own the destination bucket, then the destination bucket policy is first updated in order to permit the initiating account to create the DataSync location.
A DataSync transfer task is created.
The DataSync task is then executed, which creates a DataSync task execution resource and initiates the transfer of S3 objects between the source and destination DataSync locations.
Information on these created resources are then returned from the transfer call, into the result
variable as seen below.
// start the transfer
const { result, error } = await transfer('source-bucket', 'destination-bucket', 'Transfer to my other bucket');
Error Handling and Retries
Errors can happen at any point during the transfer process, from network issues all the way up to misconfigurations on your part. In any case, you can check for errors via the error
property returned by the transfer call.
const { result, error } = await transfer('source-bucket', 'destination-bucket', 'Transfer to my other bucket');
if (error) {
console.error(error);
}
else {
console.info('Transfer successful!');
}
Retrying a Failed Transfer
Understanding How a Transfer is Made, it is possible for an unexpected error (e.g., network or system error) to arise during any of the AWS resource-creation step. This can cause the transfer to become incomplete, where some of the necessary AWS resources have already been created, while the rest have not yet.
In this case, you can retry the transfer as follows:
let transferState = await transfer('source-bucket', 'destination-bucket', 'Transfer to my other bucket');
// retry on error
while (transferState.error) {
console.error(transferState.error);
console.info('Retrying...');
// you can also make retry optional by prompting retry confirmation first before executing this retry statement below
transferState = await transfer('source-bucket', 'destination-bucket', 'Transfer to my other bucket', transferState.result);
}
// you can also retry despite script termination by exporting transferState.result, to a file for example
This way, successfully created AWS resources will be reused when retrying the transfer. Without supplying the transferState.result
argument in the example above, calling transfer()
mutiple times will create a new set of resources, which will likely cause AWS to complain about resource duplication.
Permissions
IAM User Permissions
In order to create the necessary DataSync resources on the initiating AWS account, this script assumes the IAM user behind the provided AWS config of the initiating account. In other words, using the code snippet from the Usage section as context, if initiatingAccount
equals source
, then the srcAwsProfile
is used to create said resources. Else, if initiatingAccount
equals destination
, then the destAwsProfile
is used instead.
This implies that the IAM user assumed by this script must be permitted certain actions in order to setup the DataSync-S3 transfer. These actions are:
datasync:CancelTaskExecution
datasync:CreateLocationS3
datasync:CreateTask
datasync:DescribeLocation*
datasync:DescribeTask
datasync:DescribeTaskExecution
datasync:ListLocations
datasync:ListTasks
datasync:ListTaskExecutions
datasync:StartTaskExecution
iam:AttachRolePolicy
iam:CreateRole
iam:CreatePolicy
iam:ListRoles
iam:PassRole
s3:GetBucketLocation
s3:ListAllMyBuckets
s3:ListBucket
IAM Role Permissions
The IAM role, whose ARN is passed in the srcDataSyncRole
DataSync initialization option, must have the following permission setup as mentioned in Creating an IAM role for DataSync to access your S3 bucket.
Role trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "datasync.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "ACCOUNT_ID"
},
"StringLike": {
"aws:SourceArn": "arn:aws:datasync:AWS_REGION:ACCOUNT_ID:*"
}
}
}
]
}
Replace ACCOUNT_ID
with the AWS account ID that owns the role, and AWS_REGION
with the region you only want DataSync to work on. Restricting the trust relationship this way is recommended to prevent the cross-service deputy problem.
Role permission policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::*"
},
{
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListMultipartUploadParts",
"s3:GetObjectTagging",
"s3:PutObjectTagging",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::*/*"
}
]
}
For the role permission policy, the idea is to allow certain actions on S3 objects that belong to the source and destination buckets. And by scoping the Resource
properties as such, you can initiate S3 object transfers from any buckets in the source AWS accuont, to any buckets in the destination AWS account.
That being said, you can limit the scope of the Resource
policy elements by explicitly listing the source and destination buckets only. Just keep in mind that S3 object transfers will not work if neither the source nor destination bucket is not included in this scope.
High-Level System Design
flowchart TD
start(((Start)))
in[/Source and destination buckets,\nAWS creds, DataSync options, etc./]
policy[[Update source or destination\nbucket policy as necessary]]
locs[[Create DataSync source and\ndestination S3 locations]]
task[[Create DataSync\ntransfer task]]
exec[[Execute\ntransfer task]]
out[/DataSync source S3 location,\nDataSync destination S3 location,\nDataSync task,\nDataSync task execution/]
stop(((Stop)))
start -- input --> in
in --> policy
policy --> locs
locs --> task
task --> exec
exec -- output --> out
out --> stop
License
This project is licensed under the terms of the the 3-Clause BSD license.