npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

crawfishcloud

v0.8.3

Published

A Streaming S3 Bucket Glob Crawler

Downloads

26

Readme

crawfishcloud

GH CI codecov

NPM Version Pkg Size MIT License

Github Activity Issues Closed Issues

A Streaming S3 Bucket Glob Crawler

Deep Learning AI Generated Psychadelic Style Transfer Art of Crawfish painted with sunset clouds

crawfishcloud makes it VERY easy to get started pulling globbed files from S3.

Install

npm i crawfishcloud -S

Table of Contents

Setup

// supports BOTH import and require 
import {crawler, asVfile} from 'crawfishcloud'

// Setup AWS-S3 with your credentials
import {S3, SharedIniFileCredentials} from 'aws-sdk'
const credentials = new SharedIniFileCredentials({profile: 'default'})

// crawfish uses your configured S3 Client to get data from S3
const crawfish = crawler({s3c: new S3({credentials})})

Usage

Async Generator
for await (const vf of crawler({s3c}).vfileIter('s3://Bucket/path/*.jpg')){
  console.log({vf})
}
Promise<Arrray<Vfile | Vinyl>>
const allJpgs = await crawler({s3c}).vinylArray('s3://Bucket/path/*.jpg')
Stream< Vfile | Vinyl >
crawler({s3c}).vfileStream('/prefix/**/*.jpg').pipe(destination())

Why use crawfishcloud?

Ever had a set of files in S3 and you are thinking "Why can't I use a glob pattern like I would in a unix command, or in gulp, and pull all of those files out together?"

Now you can.

Features

crawfishcloud supports 3 different processing patterns to handle data from your buckets.

  • Promised Arrays
    • While this structure is admittedly the most straight forward, it can also blow through your RAM because collapsing an S3 stream to one array can often take more space than is commericial available for RAM. Sure, maybe you are thinking "I know my data, and I just need the 5 files loaded together from this s3 prefix, and I know it will fit" - then the array pattern is just the ticket.
  • Node Streams
    • Node Streams are incredible if you are familiar with them. The .stream() pattern allows you to stream out a set of obejcts to your down stream processing.
  • AsyncGenerators
    • For many people, although Async Generators are a newer addition to the language, it will strike a sweet spot of "ease of use" and still being able to process terribly large amounts of data. since its pulled from the network on demand.
  • Uses Modern Syntax
  • async/ await
  • All in about 230 lines of js code sread over about 3 files.

Contributions

Pull requests welcome. Drop me a line in the Discussion area - or submit an issue.

Inspired By

I was about to use one of those - except "before marrying an old lady" - I thought I would see if I realy understood these before taking on caring for an old lady, and by the time I understood I had opinions which are now roughly manifest through out the package.

License

MIT © Eric D Moore

API Reference

crawler()

the default export function "aka: crawler"

  • params

    • s3c: : S3
    • body: : boolean
    • maxkeys: : number
    • ...filters: string[]
  • returns

    • crawfishcloud
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const arr = await crab.all({body:true, using: asVfile})

> Base Returns

iter()

get an AsyncGenerator<T> ready to use with a for await (){} loop where each elemement is of Type based on the Using Function

  • params

    • body : boolean
    • using : UsingFunc: (i:S3Item)=><T>
    • NextContinuationToken? : string | undefined
    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • AsyncGenerator with elements of type where T is the
  •  import {crawler, asVfile} from 'crawfishcloud'
     import {S3, SharedIniFileCredentials} from 'aws-sdk'
     const credentials = new SharedIniFileCredentials({profile:'default'})
     const s3c = new S3({credentials, region:'us-west-2'})
       
     const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
     for await (const vf of crab.iter({body:true, using: asVfile}) ){
         console.log(vf)
     }

stream()

get a Readable Node Stream ready to pipe to a transform or writable stream

  • params

    • body : boolean
    • using : UsingFunc: (i:S3Item)=><T>
    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Readable
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    crab.stream({body:true, using: asVfile})
        .pipe(rehypePipe())
        .pipe(destinationFolder())

all()

load all of the s3 url into an array. Where the array is resolved when all of the elements are populated to the array.

  • params

    • body : boolean
    • using : UsingFunc: (i:S3Item)=><T>
    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Promise<T[]>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const arr = await crab.all({body:true, using: asVfile})

reduce()

Reduce the files represented in the glob into a new Type. The process batches sets of 1000 elements into memory and reduces.

  • params

    • init : <OutputType> - starting value for reducer
    • using : UsingFunc: (i:S3Item)=><ElementType>
    • reducer : (prior:OutputType, current:ElementType, i:number)=>OutputType
    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Promise<OutputType>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const count = await crab.reduce(0, using: asS3, reducer:(p)=>p+1)

> Streams

vfileStream()

a stream of vfiles

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Readable
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c})
    crab.vfileStream('s3://ericdmoore.com-images/*.jpg')
        .pipe(jpgOptim())
        .pipe(destinationFolder())

vinylStream()

a stream of vinyls

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Readable
  •  import {crawler, asVfile} from 'crawfishcloud'
     import {S3, SharedIniFileCredentials} from 'aws-sdk'
     const credentials = new SharedIniFileCredentials({profile:'default'})
     const s3c = new S3({credentials, region:'us-west-2'})
       
     const crab = crawler({s3c})
     crab.vinylStream('s3://ericdmoore.com-images/*.jpg')
         .pipe(jpgOptim())
         .pipe(destinationFolder())

s3Stream()

a stream of S3 Items where S3 list object keys are mixed in with the the getObject keys - called an S3Item

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Readable
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c})
    crab.s3Stream('s3://ericdmoore.com-images/*.jpg')
        .pipe(S3ImageOptim())
        .pipe(destinationFolder())

> AsyncGenerators

vfileIter()

get an AyncGenerator thats is ready to run through a set of VFiles

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • AsyncGenerator<VFile, void, undefined>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c})
    for await (const vf of crab.vfileIter('s3://ericdmoore.com-images/*.jpg') ){
        console.log(vf)
    }    

vinylIter()

get an AyncGenerator thats is ready to run through a set of Vinyls

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • AsyncGenerator<Vinyl, void, undefined>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c})
    for await (const v of crab.vinylIter('s3://ericdmoore.com-images/*.jpg') ){
        console.log(vf)
    }    

s3Iter()

get an AyncGenerator thats is ready to run through a set of S3Item

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • AsyncGenerator<S3Item, void, undefined>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c})
    for await (const s3i of crab.s3Iter('s3://ericdmoore.com-images/*.jpg') ){
        console.log(s3i)
    }

> Promised Arrays

vfileArray()

get an array of vfiles all loaded into a variable

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Promise<Vfile[]>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const vfArr = await crab.vfileArray()

vinylArray()

get an array of vinyls all loaded into a variable

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Promise<Vinyl[]>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const vArr = await crab.vinylArray()

s3Array()

get an array of S3Items all loaded into a variable

  • params

    • ...filters: string[] - will overite any configured filters already given to the crawfish - last filters in wins
  • returns

    • Promise<S3Item[]>
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
    
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    const arr = await crab.s3Array()

> Exporting Functions

asVfile()

turn an S3 object into a vfile

  • params

    • s3i : S3Item
    • i : number
  • returns

    • Vfile
  • import {crawler, asVfile} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    for await (const vf of crab.iter({body:true, using: asVfile}) ){
        console.log(vf)
    }

asVinyl()

turn an S3 object into a vinyl

  • params

    • s3i : S3Item
    • i : number
  • returns

    • Vinyl
  • import {crawler, asVinyl} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    for await (const vf of crab.iter({body:true, using: asVinyl}) ){
        console.log(vf)
    }

asS3()

Just pass the S3 object structure along

  • params

    • s3i : S3Item
    • i : number
  • returns

    • S3Item
  • import {crawler, asS3} from 'crawfishcloud'
    import {S3, SharedIniFileCredentials} from 'aws-sdk'
    const credentials = new SharedIniFileCredentials({profile:'default'})
    const s3c = new S3({credentials, region:'us-west-2'})
      
    const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
    for await (const vf of crab.iter({body:true, using: asS3}) ){
        console.log(vf)
    }

namesake

crawfish cloud because why not, and because regular crawfish are delightful and they crawl around in a a bucket for a time. So clearly crawfishcloud is a crawler of cloud buckets.

Logo credit: deepart.io