@wholebuzz/fs
v1.3.0
Published
File system interface abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, and Local file systems.
Downloads
82
Maintainers
Readme
@wholebuzz/fs
File system abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, HTTP, and Local file systems. Provides atomic primitives enabling multiple readers and writers.
- LocalFileSystem employs content hashing to approximate GCS Object Versioning.
- GoogleCloudFileSystem provides consistent parallel access paterns.
- S3FileSystem provides basic file system primitives.
- SMBFileSystem provides basic file system primitives.
- HTTPFileSystem provides a basic HTTP file system.
Provides file format implementations for:
- Lines
- CSV (via csv)
- JSON, ND-JSON / JSONL (via JSONStream and ndjson)
- Parquet including
streamingParquet
codec and parquetjs. - TFRecord including tfrecord-stream.
Additionally provides sharding & merging utilities.
Dependencies
The FileSystem
implementations require peer dependencies:
- AnyFileSystem: None. URL resolution as a
FileSystem
. Files have URLs and HTTP is a file system. - AzureBlobStorageFileSystem:
@azure/storage-blob
and@azure/identity
- AzureFileShareFileSystem:
@azure/storage-file-share
- GoogleCloudFileSystem:
@google-cloud/storage
- HTTPFileSystem:
axios
- LocalFileSystem:
fs-ext
,glob
, andglob-stream
- S3FileSystem:
aws-sdk
,s3-stream-upload
, andathena-express
- SMBFileSystem:
@marsaud/smb2
Credits
Built with the tree-stream primitives ReadableStreamTree
and WritableStreamTree
.
Project history
The project started to support @wholebuzz/archive, a terabyte-scale archive for GCS. The focus has since expanded to include powering dbcp and @wholebuzz/mapreduce with a collection of file system implementations under a common interface. The atomic primitives are only available for Google Cloud Storage and local.
Example
import { AnyFileSystem } from '@wholebuzz/fs/lib/fs'
import { GoogleCloudFileSystem } from '@wholebuzz/fs/lib/gcp'
import { HTTPFileSystem } from '@wholebuzz/fs/lib/http'
import { LocalFileSystem } from '@wholebuzz/fs/lib/local'
import { S3FileSystem } from '@wholebuzz/fs/lib/s3'
import { readJSON, writeJSON } from '@wholebuzz/fs/lib/json'
const httpFileSystem = new HTTPFileSystem()
const fs = new AnyFileSystem([
{ urlPrefix: 'gs://', fs: new GoogleCloudFileSystem() },
{ urlPrefix: 's3://', fs: new S3FileSystem() },
{ urlPrefix: 'http://', fs: httpFileSystem },
{ urlPrefix: 'https://', fs: httpFileSystem },
{ urlPrefix: '', fs: new LocalFileSystem() },
])
await writeJSON(fs, 's3://bucket/file', { foo: 'bar' })
const foobar = await readJSON(fs, 's3://bucket/file')
CLI
node lib/cli.js ls .
node lib/cli.js --help
API Reference
Modules
Methods
- appendToFile
- copyFile
- createFile
- ensureDirectory
- fileExists
- getFileStatus
- moveFile
- openReadableFile
- openWritableFile
- queueRemoveFile
- readDirectory
- readDirectoryStream
- removeDirectory
- removeFile
- replaceFile
Constructors
constructor
+ new FileSystem(): FileSystem
Returns: FileSystem
Methods
appendToFile
▸ Abstract
appendToFile(urlText
: string, writeCallback
: (stream
: WritableStreamTree) => Promise<boolean>, createCallback?
: (stream
: WritableStreamTree) => Promise<boolean>, createOptions?
: CreateOptions, appendOptions?
: AppendOptions): Promise<null
| FileStatus>
Appends to the file, safely. Either writeCallback
or createCallback
is called.
For simple appends, the same paramter can be supplied for both writeCallback
and
createCallback
.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to append to. |
| writeCallback
| (stream
: WritableStreamTree) => Promise<boolean> | Stream callback for appending to the file. |
| createCallback?
| (stream
: WritableStreamTree) => Promise<boolean> | Stream callback for initializing the file, if necessary. |
| createOptions?
| CreateOptions | Initial metadata for initializing the file, if necessary. |
| appendOptions?
| AppendOptions | - |
Returns: Promise<null
| FileStatus>
Defined in: src/fs.ts:209
copyFile
▸ Abstract
copyFile(sourceUrlText
: string, destUrlText
: string): Promise<boolean>
Copies the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| sourceUrlText
| string | The URL of the source file to copy. |
| destUrlText
| string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:178
createFile
▸ Abstract
createFile(urlText
: string, createCallback?
: (stream
: WritableStreamTree) => Promise<boolean>, options?
: CreateOptions): Promise<boolean>
Creates file, failing if the file already exists.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to create. |
| createCallback?
| (stream
: WritableStreamTree) => Promise<boolean> | Stream callback for initializing the file. |
| options?
| CreateOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:155
ensureDirectory
▸ Abstract
ensureDirectory(urlText
: string, options?
: EnsureDirectoryOptions): Promise<boolean>
Ensures the directory exists
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the directory. |
| options?
| EnsureDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:109
fileExists
▸ Abstract
fileExists(urlText
: string): Promise<boolean>
Returns true
if the file exists.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to check whether exists. |
Returns: Promise<boolean>
Defined in: src/fs.ts:121
getFileStatus
▸ Abstract
getFileStatus(urlText
: string, options?
: GetFileStatusOptions): Promise<FileStatus>
Determines the file status. The file version is used to implement atomic mutations.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to retrieve the status for. |
| options?
| GetFileStatusOptions | - |
Returns: Promise<FileStatus>
Defined in: src/fs.ts:127
moveFile
▸ Abstract
moveFile(sourceUrlText
: string, destUrlText
: string): Promise<boolean>
Moves the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| sourceUrlText
| string | The URL of the source file to copy. |
| destUrlText
| string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:185
openReadableFile
▸ Abstract
openReadableFile(url
: string, options?
: OpenReadableFileOptions): Promise<ReadableStreamTree>
Opens a file for reading.
optional
version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| url
| string | The URL of the file to read from. |
| options?
| OpenReadableFileOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:134
openWritableFile
▸ Abstract
openWritableFile(url
: string, options?
: OpenWritableFileOptions): Promise<WritableStreamTree>
Opens a file for writing.
optional
version Fails if version doesn't match for GCS URLs.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| url
| string | The URL of the file to write to. |
| options?
| OpenWritableFileOptions | - |
Returns: Promise<WritableStreamTree>
Defined in: src/fs.ts:144
queueRemoveFile
▸ Abstract
queueRemoveFile(urlText
: string): Promise<boolean>
Queues deletion, e.g. after DaysSinceCustomTime.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:171
readDirectory
▸ Abstract
readDirectory(urlText
: string, options?
: ReadDirectoryOptions): Promise<DirectoryEntry[]>
Returns the URLs of the files in a directory.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the directory to list files in. |
| options?
| ReadDirectoryOptions | - |
Returns: Promise<DirectoryEntry[]>
Defined in: src/fs.ts:94
readDirectoryStream
▸ Abstract
readDirectoryStream(urlText
: string, options?
: ReadDirectoryOptions): Promise<ReadableStreamTree>
Returns a stream of the URLs of the files in a directory.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the directory to list files in. |
| options?
| ReadDirectoryOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:100
removeDirectory
▸ Abstract
removeDirectory(urlText
: string, options?
: RemoveDirectoryOptions): Promise<boolean>
Removes the directory
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the directory. |
| options?
| RemoveDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:115
removeFile
▸ Abstract
removeFile(urlText
: string): Promise<boolean>
Deletes the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:165
replaceFile
▸ Abstract
replaceFile(urlText
: string, writeCallback
: (stream
: WritableStreamTree) => Promise<boolean>, options?
: ReplaceFileOptions): Promise<boolean>
Replaces the file, failing if the file version doesn't match.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| urlText
| string | The URL of the file to replace. |
| writeCallback
| (stream
: WritableStreamTree) => Promise<boolean> | Stream callback for replacing the file. |
| options?
| ReplaceFileOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:194 @wholebuzz/fs / Exports / json
Module: json
Table of contents
Variables
Functions
- newJSONLinesFormatter
- newJSONLinesParser
- parseJSON
- parseJSONLines
- pipeJSONFormatter
- pipeJSONLinesFormatter
- pipeJSONLinesParser
- pipeJSONParser
- readJSON
- readJSONHashed
- readJSONLines
- serializeJSON
- serializeJSONLines
- writeJSON
- writeJSONLines
- writeShardedJSONLines
Variables
JSONStream
• Const
JSONStream: any
Defined in: src/json.ts:11
Functions
newJSONLinesFormatter
▸ Const
newJSONLinesFormatter(): Transform
Returns: Transform
Defined in: src/json.ts:146
newJSONLinesParser
▸ Const
newJSONLinesParser(): ThroughStream
Returns: ThroughStream
Defined in: src/json.ts:147
parseJSON
▸ parseJSON(stream
: ReadableStreamTree): Promise<unknown>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream
| ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown>
Defined in: src/json.ts:72
parseJSONLines
▸ parseJSONLines(stream
: ReadableStreamTree): Promise<unknown[]>
Parses JSON object from [[stream]]. Used to implement readJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream
| ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:80
pipeJSONFormatter
▸ pipeJSONFormatter(stream
: WritableStreamTree, isArray
: boolean): WritableStreamTree
Create JSON formatter stream.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream
| WritableStreamTree | - |
| isArray
| boolean | Accept array objects or property tuples. |
Returns: WritableStreamTree
Defined in: src/json.ts:127
pipeJSONLinesFormatter
▸ pipeJSONLinesFormatter(stream
: WritableStreamTree): WritableStreamTree
Create JSON-lines formatter stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream
| WritableStreamTree |
Returns: WritableStreamTree
Defined in: src/json.ts:142
pipeJSONLinesParser
▸ pipeJSONLinesParser(stream
: ReadableStreamTree): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream
| ReadableStreamTree |
Returns: ReadableStreamTree
Defined in: src/json.ts:119
pipeJSONParser
▸ pipeJSONParser(stream
: ReadableStreamTree, isArray
: boolean): ReadableStreamTree
Create JSON parser stream.
Parameters
| Name | Type |
| :------ | :------ |
| stream
| ReadableStreamTree |
| isArray
| boolean |
Returns: ReadableStreamTree
Defined in: src/json.ts:110
readJSON
▸ readJSON(fileSystem
: FileSystem, url
: string): Promise<unknown>
Reads a serialized JSON object or array from a file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem
| FileSystem | - |
| url
| string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown>
Defined in: src/json.ts:17
readJSONHashed
▸ readJSONHashed(fileSystem
: FileSystem, url
: string): Promise<[unknown, null
| string]>
Reads a serialized JSON object from a file, and also hashes the file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem
| FileSystem | - |
| url
| string | The URL of the file to parse a JSON object from. |
Returns: Promise<[unknown, null
| string]>
Defined in: src/json.ts:25
readJSONLines
▸ readJSONLines(fileSystem
: FileSystem, url
: string): Promise<unknown[]>
Reads a serialized JSON-lines array from a file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem
| FileSystem | - |
| url
| string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:35
serializeJSON
▸ serializeJSON(stream
: WritableStreamTree, obj
: object | any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSON.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream
| WritableStreamTree | The stream to write a JSON object to. |
| obj
| object | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:88
serializeJSONLines
▸ serializeJSONLines(stream
: WritableStreamTree, obj
: any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSONLines.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| stream
| WritableStreamTree | The stream to write a JSON object to. |
| obj
| any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:103
writeJSON
▸ writeJSON(fileSystem
: FileSystem, url
: string, value
: object | any[]): Promise<boolean>
Serializes object or array to a JSON file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem
| FileSystem | - |
| url
| string | The URL of the file to serialize a JSON object or array to. |
| value
| object | any[] | The object or array to serialize. |
Returns: Promise<boolean>
Defined in: src/json.ts:44
writeJSONLines
▸ writeJSONLines(fileSystem
: FileSystem, url
: string, obj
: object[]): Promise<boolean>
Serializes array to a JSON Lines file.
Parameters
| Name | Type | Description |
| :------ | :------ | :------ |
| fileSystem
| FileSystem | - |
| url
| string | The URL of the file to serialize a JSON array to. |
| obj
| object[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:53
writeShardedJSONLines
▸ writeShardedJSONLines(fileSystem
: FileSystem, url
: string, obj
: object[], shards
: number, shardFunction?
: (x
: object, modulus
: number) => number): Promise<boolean>
Parameters
| Name | Type |
| :------ | :------ |
| fileSystem
| FileSystem |
| url
| string |
| obj
| object[] |
| shards
| number |
| shardFunction
| (x
: object, modulus
: number) => number |
Returns: Promise<boolean>
Defined in: src/json.ts:57