npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

inceptum-etl

v0.11.3

Published

A framework for ETL processes

Downloads

11

Readme

Inceptum ETL

Inceptum ETL is a tool designed to facilitate the creation and management of Extract, Transform, Load (ETL) scripts.

Inceptum ETL is what we use at hipages for our internal projects.

Benefits

In this project we're managing all the basics that are needed for an ETL project:

  • Language: typescript
  • Base: typescript base
  • Supported technologies: Mysql, Postgres, Redis, and Elasticsearch
  • Easy to extend
  • Easy to upgrade

One of the most valuable features that comes with Inceptum based projects is the ability to easily upgrade to the newest Inceptum version with just a few simple git commands. As we continue to refine our standard your projects benefit as well.

Dependencies

How it works

Inceptum-etl has been designed to follow the Extract, Transform, Load paradigm:

  • To extract data we create "sources"
  • To transform data we create "transformers"
  • To load data we create "destinations"

Now the extra parts:

  • "savepoints" add fault tolerance to the etl and manage the starting point
  • "configuration" puts all the pieces together
  • "runner" runs the ETL

Available sources

  • Adwords reports
  • Adwords report historical data
  • Google analytics transactions
  • Google analytics landing pages
  • MySQL data

Available transformers

  • Simple copy
  • Split adwords campaign
  • Field mapping
  • Smart field mapping

Available destinations

  • CSV file
  • JSON file
  • Amazon S3
  • Redshift
  • Elasticsearch

Available savepoints

  • MySQL table
  • Static value

Every part of the ETL is set up via the config file ( default.yml )

Use one config file: default.yml, development.yml and production.yml

app:
  name: Inceptum Etl
  validEtls:
    - ETL_UNIQUE_NAME
    - ETL_UNIQUE_NAME_2
   
generalConfig:
  source:
    maxRetries: 3
    timeoutMillis: 5000
  transformer:
    minSuccessPercentage: 1
    timeoutMillis: 5000
  destination:
    maxRetries: 3
    timeoutMillis: 5000
    batchSize: 1

etls:
    ETL_UNIQUE_NAME:
      source:
        type: source_name
        source_parameters
      transformer:
        type: transformer_name
        transformer_parameters
      destination:
        type: destination_name
        destination_parameters
      savepoint:
        type: savepoint_name
        savepoint_parameters

    ETL_UNIQUE_NAME_2:
      source:
        type: source_name
        source_parameters
      transformer:
        type: transformer_name
        transformer_parameters
      destination:
        type: destination_name
        destination_parameters
      savepoint:
        type: savepoint_name
        savepoint_parameters

# DATABASES
postgres:
  DATABASE_CLIENT_NAME:
    master:
      database_login_parameters
    slave:
      database_login_parameters

mysql:
  DATABASE_CLIENT_NAME:
    master:
      database_login_parameters
    slave:
      database_login_parameters
# LOG settings
logging:
  streams:
    console:
      type: console
    myredis:
      type: redis
    mainLogFile:
      type: file
      path: main.log
  loggers:
    - name: ROOT
      streams:
        console: debug
    - name: ioc/
      streams:
        console: debug
    - name: mysql/
      streams:
        console: debug

Use a default.yml config file and a separated config for each etl

The default values are in: default.yml, development.mnt.yml and production.yml

app:
  name: Inceptum Etl
  validEtls:
    - ETL_UNIQUE_NAME
    - ETL_UNIQUE_NAME_2
    
generalConfig:
  source:
    maxRetries: 3
    timeoutMillis: 5000
  transformer:
    minSuccessPercentage: 1
    timeoutMillis: 5000
  destination:
    maxRetries: 3
    timeoutMillis: 5000
    batchSize: 1

# General values
sources:
  source_name:
    source_parameters
  source_name_2:
    source_parameters

transformers:
  transformer_name:
    transformer_parameters
  transformer_name_2:
    transformer_parameters

destinations:
  destination_name:
    destination_parameters
  destination_name_2:
    destination_parameters

savepoints:
  savepoint_name:
    savepoint_parameters

# DATABASES
postgres:
  DATABASE_CLIENT_NAME:
    master:
      database_login_parameters
    slave:
      database_login_parameters

mysql:
  DATABASE_CLIENT_NAME:
    master:
      database_login_parameters
    slave:
      database_login_parameters

# LOG settings
logging:
  streams:
    console:
      type: console
    myredis:
      type: redis
    mainLogFile:
      type: file
      path: main.log
  loggers:
    - name: ROOT
      streams:
        console: debug
    - name: ioc/
      streams:
        console: debug
    - name: mysql/
      streams:
        console: debug

Set the variable NODE_APP_INSTANCE with the name of the etl

development-{etl_name}.yml, production-{etl_name}.yml

generalConfig:
  source:
    type: source_name
  transformer:
    type: transformer_name_2
  destination:
    type: destination_name_2
  savepoint:
    type: savepoint_name

# Overwrite the required source, transformer, destination or savepoint as required
sources:
  source_name:
    etl_source_parameters

destinations:
  destination_name_2:
    etl_destination_parameters

How to setup - Empty project

Starting a new project from scratch is easy!

$ mkdir project-name
$ cd project-name
$ git init
$ git remote add typescript-base [email protected]:hipages/typescript-base.git
$ git pull typescript-base master
$ vi package.json  # Edit the necessary elements of the project definition
$ yarn install # Or npm install... whatever you prefer... I prefer yarn
$ yarn add inceptum-elt # OR npm install
$ vi config/default.yml # set up your etl here
$ vi index.ts # the following code will run any etl

Example

import { LogManager, InceptumApp, Context } from 'inceptum';
import * as program from 'commander';
import { SourcePlugin,
  TransformerPlugin,
  DestinationPlugin,
  ConfigPlugin,
  RunnerPlugin,
  SavepointPlugin,
} from 'inceptum-etl';

program.version('0.1.0')
  .usage('[options] <etlName>')
  .option('-v', 'verbose')
  .parse(process.argv);

if (program.args.length === 0) {
  // tslint:disable-next-line:no-console
  console.log('Please specify an etl to execute');
  // tslint:disable-next-line:no-console
  console.log(program.usage());
  process.exit(1);
}

const etlName = program.args[0];

const app = new InceptumApp();

const logger = LogManager.getLogger(__filename);

const validEtls = app.getConfig('app.validEtls', []);

if (validEtls.indexOf(etlName) < 0) {
  // tslint:disable-next-line:no-console
  console.log(`Unknown etl name: ${etlName}. Valid etls: ${validEtls.join(', ')}`);
  // tslint:disable-next-line:no-console
  console.log(program.usage());
  process.exit(1);
}

logger.info(`Starting execution of ETL: ${etlName}`);

// const etlPlugin = new EtlPlugin(etlName);
// app.use(etlPlugin);
const context = app.getContext();
app.use(new SavepointPlugin(etlName),
        new DestinationPlugin(etlName),
        new TransformerPlugin(etlName),
        new SourcePlugin(etlName),
        new ConfigPlugin(etlName),
        new RunnerPlugin(etlName),
      );

const f = async () => {
  await app.start();

  // Run the ETL
  const etlRunner = await context.getObjectByName('EtlRunner');
  try {
    await etlRunner.executeEtl()
        .then(function() {
            // log success
            logger.info(`Finished all good`);
        });
  } catch (err) {
    logger.fatal(err, `Finished Error:${err.message}`);
  }
  // tslint:disable-next-line:no-console
  console.log('The runner is', etlRunner);
  await app.stop();
};
f().catch( (err) => {
  logger.fatal(err, `Etl finished before starting :${err.message}`);
});