automizer-data
v0.2.1
Published
Store and browse aggregated data using Prisma database tool
Downloads
44
Readme
Parse, tag and query xlsx-data
This package is used to parse structured tabular data from CSV or XLSX. Stored data can be browsed and transformed into a desired two-dimensional result table.
Its primary purpose is to deliver data for pptx-automizer.
This project is Work in progress.
Nevertheless, you might already use automizer-data
to handle table files coming from statistical analytics software.
The example-xlsx in __test__/data
-folder is based on GESStabs.
Storage and querying is done with Prisma ORM tools.
Installation
As a package
If you are working on an existing project, you can add automizer-data to it using npm or yarn. Run
$ yarn add automizer-data
or
$ npm install automizer-data
in the root folder of your project. This will download and install the most recent version into your existing project.
As a cloned repository
If you want to see how it works and you like to run own tests, you should clone this repository and install the dependencies:
$ git clone [email protected]:singerla/automizer-data.git my-project
$ cd my-project
$ yarn install
$ yarn prisma generate
Prisma studio
You can open prisma studio and take a look at the data:
$ yarn prisma studio
A lot of good stuff can be found at prisma.io.
Import Data
According to parser's configuration, parsed data will sliced, tagged and separated into two-dimensional tables.
The Database contains:
- Categories: Generic nouns to describe the basic structure of your project
- Tags: Values of a certain category
- Sheets: Two-dimensional tables and their additional info
Each Sheet will contain:
- a collection of rows
- a collection of columns
- the two-dimensional table body
- a collection of tags
- a collection of metadata that came along with the sheet
Example usage
import { PrismaClient } from '@prisma/client'
import { Parser, Store } from '../src/index';
import { ParserOptions, Tagger, RawResultInfo } from "../src/types";
const store = new Store(
new PrismaClient()
)
const config = <ParserOptions> {
// This string separates tables if found in Column A
separator: 'Table Separator',
// A row that fits to any of the strings below will be
// separated into "meta"-field if found in Col A
metaMap: {
base: ['BASE'],
topBox: ['Top-2-Box (1-2)'],
bottomBox: ['Bottom-2-Box (4-5)'],
mean: ['Mean Value']
},
// Rows that equal to one of the labels below will be skipped.
skipRows: [
'company',
'* Annotation',
'(Sum of answers)'
],
// A callback function to be applied to every body row
renderRow: (row: string[]): (number|null)[] => {
return row.map(cell => {
if(cell === ' ') return null
else return Number(cell)
})
},
// The info array of each sub-table will be passed to this
// callback. Tagging can be fine tuned here.
renderTags: (info: RawResultInfo[], tags: Tagger): void => {
info.forEach((info, level) => {
let cat
// info.key contains the info string's original section
// could be 'body' or 'info'
if(info.key === 'body') {
cat = 'vartitle'
} else if(info.value.indexOf('- ') === 0) {
cat = 'measure'
} else if(level === 0) {
// We strip the table separator and pass CountryName
info.value = info.value.replace('Table Separator – ', '')
cat = 'country'
} else if(level === 1) {
cat = 'variable'
} else if(level === 2) {
cat = 'questionText'
}
if(cat) {
tags.push(cat, info.value)
}
})
},
}
const parse = new Gesstabs(config)
const file = `${__dirname}/data/test-data.xlsx`
const datasheets = await parse.fromXlsx(filename)
const summary = await store.run(datasheets)
Intermediate JSON
Xlsx-Parser will tranform tabular data into an intermediate JSON object. The closer your input data comes to this format, the easier it will be to implement a new parser type.
{
"tags": [
{
"category": "country",
"value": "Norway"
},
{
"category": "variable",
"value": "Q12"
},
{
"category": "category",
"value": "Bar soap"
},
{
"category": "subgroup",
"value": "Age"
},
{
"category": "measure",
"value": "nominal"
}
],
"columns": ["Total", "19-29", "30-39", "40-69"],
"rows": ["answer 1", "answer 2", "answer 3"],
"data": [
[29,18,36,12],
[39,19,24,11],
[19,28,46,10]
],
"meta": {
"significance": [
[null,null,"h",null],
[null,"h","l",null],
[null,null,"h","l"]
]
}
}
Query Data
As all the Sheets are tagged, our queries will use tags to find the desired datasets.
import { getData, Store } from '../src';
import { all } from '../src/filter';
import { value } from '../src/cell';
// A selector is an array of tags.
const selector = [
{
category: 'country',
value: 'Norway'
},
{
category: "variable",
value: "Q12"
}
]
// The grid will define rows, cols and a callback
// to run inside a target cell.
const grid = {
rows: all('row'),
columns: all('column'),
cell: value
}
const result = await getData(selector, grid)
// automizer-data will convert the result directly into
// a pptx-automizer-object.
const chartData = result.toSeriesCategories()