dipe

v0.4.0

Published

a year ago

> Let your data flow.

Downloads

0High
0Medium
0Low

davidebruner

stream data-processing pipelines data

dipe (data-pipe)

Let your data flow.

data-pipe is a small lib that lets you write data processors functions for data reading and manipulation. It helps you lift up all the boilerplate needed for processing static or server data from domains specific contexts to pre-configured (or custom) processors callbacks.

Why would I need this?

data-pipe allows you to define data flows processors in a structured and (hopefully) readable way.

It provides you a way to configure data sources and how this data goes through (whenever it comes from local or remote sources) and gets manipulated (using f.i. normalization, aggregation, filtering, sorting or whaterver process) before it reaches the output destination.
Writing one massive logic doesn't organize well. Asynchronous data fetching frameworks for example like react-query and apollo already allow you to write data queries and mutations closer to the component contexts. Why can't static/server data fetcher be written in a similar way?!.

Here are some use case where I found myself using the package:

Next.js static/server data fetching and manipulation
NodeJS Twitter bot (this libs is used for setting up the whols data flow, from article sources to multichannel posting) every step of the pipeline is configured in a config.js file (whch can then use .env variables to access sensible data such as API tokens)

Installation 🔧

npm install -S dipe
or

yarn add dipe

Usage 💡

You can immagine something like this to be a valid configuration object

// example.config.js
const { createTask } = require("dipe");
const { LocalDataParser, LocalDataPostProcessor } = require("dipe-processors");
const LocalDataTask = (data, options) => {};

const config = {
  articles: {
    processors: [
      LocalDataTask, // use a simple function
    ],
  },
  posts: {
    processors: [
      createTask(LocalDataParser, {
        options: {
          extraOption: "./extra-option",
          source: "/",
        },
      }), // use the same function but with some additional options
      LocalFilesPostTask,
    ],
  },
};

export default config;

And then the actual implementation will look similar to:

const { articles, posts } = require("./example.config.js");
const options = {};
let { data, errors } = runSync(articles.processors, options);

// or simply
let { data, errors } = runSync([LocalDataTask, (data, options) => {}], options);

See the example in this repo for some ideas on how to organize your data using preconfigured processors.

Read data (runSync)

let { data, errors } = runSync(articles.processors):

Async read data (runAsync)

let { data, errors } = await runAsync(articles.processors):
// or
runAsync(articles.processors).then({ data }).catch(errors => console.log(errors));

Lazy read data (runLazy)

let [{ data, errors }, getArticles] = runLazy(articles.processors):
console.log(data); // null
// execute
getArticles();
console.log(data); // []

Read data as stream (readStreamData) WIP

Write data (writeData) WIP

How to alter Tasks or create a custom one?

Tasks are just functions. You can add your custom processor or either alter a specific one by creating a function which wraps up the old implementation or create it from scratch. Let's use a simple FilterTask as an example:

const FilterTask = (data: any, options?: any) => {
  const filteredData = Object.fromEntries(
    Object.entries(data).filter(options.filterBy)
  );
  return Object.values(filteredData);
};

If you wanna provide additional options specific to the processor you can use the createTask as follows.

const SimpleTask = (data, options) => {
  console.log("value: ".options.customOption);
};

const config = {
  data_one: {
    processors: [
      createTask(SimpleTask, { options: { customOption: "inital" } }),
    ],
  },
  data_two: {
    processors: [SimpleTask],
  },
};

const { data, errors } = runSync(config.data_one.processors, {
  customOption: "custom",
}); //outputs value: custom
const { data, errors } = runSync(config.data_one.processors); // outputs `value: inital`

Built in processors

This lib comes together with some built-in Tasks, available as sub-module dipe-processors (data-pipe-processors), however you can just ignore them and use your own implementations.

LocalDataParser

Parse local files hosted in the config.source defined directory and uses gray-matter to parse their content.

const config = {
  processors: [LocalDataParser],
  source: "./posts",
};

const { data: posts, errors } = runSync(config);

LocalDataPostProcessor (WIP)

Filters and sorts out data.

const config = {
  processors: [LocalDataParser, LocalDataPostProcessor],
  source: "./posts",
};

const { data: posts, errors } = runSync(config);

LocalDataStream (WIP)

RemoteDataStream (WIP)

❗ Issues

If you think any of the data-pipe can be improved, please do open a PR with any updates and submit any issues. Also, I will continue to improve this, so you might want to watch/star this repository to revisit.

💪 Contribution

We'd love to have your helping hand on contributions to data-pipe by forking and sending a pull request!

Your contributions are heartily ♡ welcome, recognized and appreciated.

How to contribute:

Open pull request with improvements
Discuss ideas in issues
Spread the word
Reach out with any feedback

🏆 Contributors

⚖️ License

The MIT License