npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

automizer-data

v0.2.1

Published

Store and browse aggregated data using Prisma database tool

Downloads

44

Readme

Parse, tag and query xlsx-data

This package is used to parse structured tabular data from CSV or XLSX. Stored data can be browsed and transformed into a desired two-dimensional result table.

Its primary purpose is to deliver data for pptx-automizer. This project is Work in progress. Nevertheless, you might already use automizer-data to handle table files coming from statistical analytics software. The example-xlsx in __test__/data-folder is based on GESStabs.

Storage and querying is done with Prisma ORM tools.

Installation

As a package

If you are working on an existing project, you can add automizer-data to it using npm or yarn. Run

$ yarn add automizer-data

or

$ npm install automizer-data

in the root folder of your project. This will download and install the most recent version into your existing project.

As a cloned repository

If you want to see how it works and you like to run own tests, you should clone this repository and install the dependencies:

$ git clone [email protected]:singerla/automizer-data.git my-project
$ cd my-project
$ yarn install
$ yarn prisma generate

Prisma studio

You can open prisma studio and take a look at the data:

$ yarn prisma studio

A lot of good stuff can be found at prisma.io.

Import Data

According to parser's configuration, parsed data will sliced, tagged and separated into two-dimensional tables.

The Database contains:

  • Categories: Generic nouns to describe the basic structure of your project
  • Tags: Values of a certain category
  • Sheets: Two-dimensional tables and their additional info

Each Sheet will contain:

  • a collection of rows
  • a collection of columns
  • the two-dimensional table body
  • a collection of tags
  • a collection of metadata that came along with the sheet

Example usage

import { PrismaClient } from '@prisma/client'
import { Parser, Store } from '../src/index';
import { ParserOptions, Tagger, RawResultInfo } from "../src/types";

const store = new Store(
  new PrismaClient()
)

const config = <ParserOptions> {
  // This string separates tables if found in Column A
  separator: 'Table Separator',
  // A row that fits to any of the strings below will be
  // separated into "meta"-field if found in Col A
  metaMap: {
    base: ['BASE'],
    topBox: ['Top-2-Box (1-2)'],
    bottomBox: ['Bottom-2-Box (4-5)'],
    mean: ['Mean Value']
  },
  // Rows that equal to one of the labels below will be skipped.
  skipRows: [
    'company',
    '* Annotation',
    '(Sum of answers)'
  ],
  // A callback function to be applied to every body row
  renderRow: (row: string[]): (number|null)[] => {
    return row.map(cell => {
      if(cell === ' ') return null
      else return Number(cell)
    })
  },
  // The info array of each sub-table will be passed to this
  // callback. Tagging can be fine tuned here.
  renderTags: (info: RawResultInfo[], tags: Tagger): void => {
    info.forEach((info, level) => {
      let cat
      // info.key contains the info string's original section
      // could be 'body' or 'info'
      if(info.key === 'body') {
        cat = 'vartitle'
      } else if(info.value.indexOf('- ') === 0) {
        cat = 'measure'
      } else if(level === 0) {
        // We strip the table separator and pass CountryName
        info.value = info.value.replace('Table Separator – ', '')
        cat = 'country'
      } else if(level === 1) {
        cat = 'variable'
      } else if(level === 2) {
        cat = 'questionText'
      }
      if(cat) {
        tags.push(cat, info.value)
      }
    })
  },
}

const parse = new Gesstabs(config)
const file = `${__dirname}/data/test-data.xlsx`
const datasheets = await parse.fromXlsx(filename)
const summary = await store.run(datasheets)

Intermediate JSON

Xlsx-Parser will tranform tabular data into an intermediate JSON object. The closer your input data comes to this format, the easier it will be to implement a new parser type.

{
  "tags": [
    {
      "category": "country",
      "value": "Norway"
    },
    {
      "category": "variable",
      "value": "Q12"
    },
    {
      "category": "category",
      "value": "Bar soap"
    },
    {
      "category": "subgroup",
      "value": "Age"
    },
    {
      "category": "measure",
      "value": "nominal"
    }
  ],
  "columns": ["Total", "19-29", "30-39", "40-69"],
  "rows": ["answer 1", "answer 2", "answer 3"],
  "data": [
    [29,18,36,12],
    [39,19,24,11],
    [19,28,46,10]
  ],
  "meta": {
    "significance": [
      [null,null,"h",null],
      [null,"h","l",null],
      [null,null,"h","l"]
    ]
  }
}

Query Data

As all the Sheets are tagged, our queries will use tags to find the desired datasets.

import { getData, Store } from '../src';
import { all } from '../src/filter';
import { value } from '../src/cell';

// A selector is an array of tags.
const selector = [
    {
      category: 'country',
      value: 'Norway'
    },
    {
      category: "variable",
      value: "Q12"
    }
]

// The grid will define rows, cols and a callback
// to run inside a target cell.
const grid = {
    rows: all('row'),
    columns: all('column'),
    cell: value
}

const result = await getData(selector, grid)

// automizer-data will convert the result directly into
// a pptx-automizer-object. 
const chartData = result.toSeriesCategories()