@tokenizer/s3

v1.0.0

Published

9 days ago

Amazon S3 tokenizer

Downloads

20,924

0High
0Medium
0Low

borewit

audio S3 AWS chunk range Amazon cloud

@tokenizer/s3

The tokenizer-s3 module enables seamless integration with Amazon Web Services (AWS) S3, allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.

Features

Streaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory. Integration with strtok3: Works seamlessly with the strtok3 tokenizer to process S3 data streams, making it easy to handle various tokenization tasks. Flexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs. Promise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.

Installation

npm install @tokenizer/s3

Sponsor

If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution. Your support helps sustain ongoing development and improvements. Become a sponsor to Borewit

API Documention

`makeChunkedTokenizerFromS3`

Initialize a tokenizer, with the option for random access, from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<IRandomAccessTokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):
The S3 client used to make requests to Amazon S3.
[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):
The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.
options (IS3Options, optional):

Returns

Promise<IRandomAccessTokenizer>:
A Promise that resolves to an instance of IRandomAccessTokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object. It supports random access reads.

`makeStreamingTokenizerFromS3`

Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<ITokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):
The S3 client used to make requests to Amazon S3.
[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):
The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.

Returns

Promise<ITokenizer>:
A Promise that resolves to an instance of ITokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object.

Compatibility

Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM). The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.

This module requires a Node.js ≥ 16 engine. It can also be used in a browser environment when bundled with a module bundler.

For TypeScript CommonJs backward compatibility, you can use load-esm.

Examples

Determine S3 file type

Determine file type (based on it's content) from a file stored Amazon S3 cloud:

import { fileTypeFromTokenizer } from 'file-type';
import { fromEnv } from '@aws-sdk/credential-providers';
import { S3Client } from '@aws-sdk/client-s3';
import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';

(async () => {

  // Initialize S3 client
  const s3 = new S3Client({
    region: 'eu-west-2',
    credentials: fromEnv(),
  });

  // Initialize S3 tokenizer
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {
    Bucket: 'affectlab',
    Key: '1min_35sec.mp4'
  });

  // Figure out what kind of file it is
  const fileType = await fileTypeFromTokenizer(s3Tokenizer);
  console.log(fileType);
})();

Reading audio metadata from Amazon S3

Retrieve music-metadata

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';
import { S3Client } from '@aws-sdk/client-s3';
import { parseFromTokenizer } from 'music-metadata/lib/core';

/**
 * Retrieve metadata from Amazon S3 object
 * @param objRequest S3 object request
 * @param options `tokenizer-s3` options
 * @return Metadata
 */
async function parseS3Object(s3, objRequest, options) {
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);
  return parseFromTokenizer(s3Tokenizer, options);
}

(async () => {
  const s3 = new S3Client({});

  const metadata = await parseS3Object(s3, {
    Bucket: 'standing0media',
    Key: '01 Where The Highway Takes Me.mp3'
  });

  console.log(metadata);
})();

A module implementation of this example can be found in @music-metadata/s3.

Dependency graph

dependency graph

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@tokenizer/s3

Features

Installation

Sponsor

API Documention

makeChunkedTokenizerFromS3

Function Signature

Parameters

Returns

makeStreamingTokenizerFromS3

Function Signature

Parameters

Returns

Compatibility

Examples

Determine S3 file type

Reading audio metadata from Amazon S3

Dependency graph

`makeChunkedTokenizerFromS3`

`makeStreamingTokenizerFromS3`