typedsv

v0.1.0

Published

2 years ago

Parse and map delimiter-separated values (csv, tsv, etc.) to TypeScript/ES6+ classes.

Downloads

0High
0Medium
0Low

dmcneil

dsv dsv-parser csv csv-parser tsv tsv-parser delimiter parser mapper typescript

Installation

NOTE TypeScript 3.2+ is required.

Install the package:

npm install typedsv --save

You may also need to install the reflect-metadata library and import it at high level (typically your application entrypoint):

npm install reflect-metadata --save

import 'reflect-metadata'

Enable decorator and metadata support in tsconfig.json:

{
    "compilerOptions": {
        "lib": [
          "es6",
          ...
        ],
        "target": "es5",
        "experimentalDecorators": true,
        "emitDecoratorMetadata": true,
        ...
    }
}

Getting Started

Given delimiter-separated data (csv, tsv, etc.):

# example.csv

"Foo","123","foo"
"Bar","321","bar"
"Baz","456","baz"

And a class such as:

// Example.ts

import { Parsed } from 'typedsv'

export default class Example {
  @Parsed(0)
  one: string

  @Parsed(1)
  two: number

  @Parsed(2)
  three: string
}

Create a Parser for the type and pass the file to the parse method:

// main.ts

import { createReadStream } from 'fs'
import { Parser } from 'typedsv'
import Example from './Example'

const parser = new Parser(Example)

parser.parse(createReadStream('./example.csv')).then((examples: Example[]) => console.log(examples))

Output:

Example[
  Example{one: 'Foo', two: 123, three: 'foo'},
  Example{one: 'Bar', two: 321, three: 'bar'},
  Example{one: 'Baz', two: 456, three: 'baz'}
]

Mapping Properties with @Parsed

The @Parsed decorator dictates how the Parser should maps values to properties within a class.

Index

Pass a number or { index: number } to specify which column to map based on its index:

"foo","bar"

class ExampleWithIndex {
  @Parsed(0)
  first: string

  @Parsed({ index: 1 })
  second: string
}

const parser = new Parser(ExampleWithIndex)
parser.parse(...)

ExampleWithIndex[
  ExampleWithIndex{first: 'foo', second: 'bar'}
]

Headers

Use a string or { header: string } to specify which column to map based on its header:

NOTE It is required that the headers option is set when calling Parser#parse.

"A","B"
"foo","bar"

class ExampleWithHeader {
  @Parsed('A')
  first: string

  @Parsed({ header: 'B' })
  second: string
}

const parser = new Parser(ExampleWithHeader)
parser.parse(input { header: true })

ExampleWithHeader[
  ExampleWithHeader{first: 'foo', second: 'bar'}
]

Both the { index: number } and { header: string } options can be used together. TypeDSV will first try to map a property using the declared header then fallback to the index.

"A","B","C"
"foo","bar","baz"

class ExampleWithHeaderAndIndex {
  @Parsed('A')
  first: string

  @Parsed({ header: 'B', index: 1 })
  second: string

  @Parsed(2)
  third: string
}

const parser = new Parser(ExampleWithHeaderAndIndex)
parser.parse(input, { header: true })

ExampleWithHeaderAndIndex[
  ExampleWithHeaderAndIndex{first: 'foo', second: 'bar', third: 'baz'}
]

A Note on Property Types

While values are first parsed as a string, the target property's type is honored so long as the conversion is straightforward. To map something beyond a few primitive types, see the map option:

`number`

"123","3.14","ABC"

@Parsed(0)
a: number // OK: 123

@Parsed(1)
b: number // OK: 3.14

@Parsed(2)
c: number // ERROR Cannot set c: String is not assignable to Number

`boolean`

Valid Values (case insensitive)

TRUE, Y, YES, T, 1
FALSE, N, NO, F, 0

"true","0","y","F","NONE"

@Parsed(0)
a: boolean // OK: true

@Parsed(1)
b: boolean // OK: false

@Parsed(2)
c: boolean // OK: true

@Parsed(3)
d: boolean // OK: false

@Parsed(4)
e: boolean // ERROR Cannot set e: String is not assignable to Boolean

Options

NOTE The below options require that the { index: number | header: string } argument form detailed above is used.

`map`

Type: (input: string) => any

Modify the input value before it is mapped to the property:

"foo","F,O,O",1
"bar","B,A,R",2
"baz","B,A,Z",3

class ExampleWithMap {
  @Parsed({
    index: 0,
    map: (input: string) => input.toUpperCase()
  })
  first: string

  @Parsed({
    index: 1,
    map: (input: string) => input.split(',')
  })
  second: string[]

  @Parsed({
    index: 2,
    map: (input: string) => {
      const n = parseInt(input)
      return n * n
    }
  })
  third: number
}

const parser = new Parser(ExampleWithMap)
parser.parse(...)

ExampleWithMap[
  ExampleWithMap{first: 'FOO', second: ['F', 'O', 'O'], third: 1},
  ExampleWithMap{first: 'BAR', second: ['B', 'A', 'R'], third: 4},
  ExampleWithMap{first: 'BAZ', second: ['B', 'A', 'Z'], third: 9}
]

While the function return type is any, an error will be thrown if the type is not the same - or cannot be parsed - as the property type as detailed in A Note on Property Types:

"foo","B,A,R"

class ExampleWithBadMap {
  @Parsed({
    index: 0,
    map: (input: string) => `${input.length}`
  })
  first: number

  @Parsed({
    index: 1,
    map: (input: string) => input.split(',')
  })
  second: string
}

ERROR Cannot set ExampleWithBadMap.second: Array is not assignable to String

`validate`

Type: (input: any) => boolean | { function: (input: any) => boolean; message?: string } | [...]

Validation to be performed before the property is set.

The option accepts a few different value types but the main idea is that the function(s) take the form (input: any) => bool where a return value of true means the value is valid.

NOTE The validate functions are called after the optional map function.

0,"John","Doe"

class ExampleWithValidation {
  @Parsed({
    index: 0,
    validate: (id: number) => id > 0
  })
  id: number
}

ERROR Validation failed for property id: ['validate.0']

The default error message just takes the form of validate.${index} where index is the position of the validation function that failed. To provide a custom message use the object form with the message option:

0,"John","Doe"

class ExampleWithValidationMessage {
  @Parsed({
    index: 0,
    validate: {
      message: 'id must be > 0',
      function: (id: number) => id > 0
    }
  })
  id: number
}

ERROR Validation failed for property id: ['id must be > 0']

Multiple objects/functions can also be passed as in an array. They are executed in order until either all pass or there is an error:

1,"John","Doe"

class ExampleWithMultipleValidations {
  @Parsed({
    index: 0,
    validate: [
      (id: number) => id > 0,
      (id: number) => id > 50,
      { message: 'id cannot be 1', function: (id: number) => id !== 1 } // will not run
    ]
  })
  id: number
}

ERROR Validation failed for property id: ['validate.1']

In case you want to collect all validation errors, use the object form with the aggregate and functions options:

1,"John","Doe"

class ExampleWithAggregatedValidationErrors {
  @Parsed({
    index: 0,
    validate: {
      aggregate: true,
      functions: [
        (id: number) => id > 50,
        { message: 'id must be > 100', function: (id: number) => id > 100 },
        (id: number) => id !== 0 // will still run even though the validation has failed
      ]
    }
  })
  id: number
}

ERROR Validation failed for property id: ['validate.0', 'id must be > 100']

Parser

The Parser constructor expects a type/class that has at least one property with decorated with @Parsed:

export default class Example {
  @Parsed(0)
  one: string
}

const parser = new Parser(Example)

NOTE An error will be thrown when attempting to create a Parser with a type that does not have any decorated properties.

Once a Parser is created, simply call the parse function with an input and options using any of the following methods:

Promise

parser.parse(input, { ... })
  .then((examples: Example[]) => ...)

`async`/`await`

Wait for all rows to be parsed...

(async () => {
  const examples: Example[] = await parser.parse(input, { ... })
})()

`onObject`

Call a function each time a row is parsed using the onObject option:

parser.parse(input, {
  onObject: (row: Example, line: number) => ...)
})

Input

The input is assumed to be formatted where each line is considered a single record. A line is then separated by a delimiter to represent a column/field value. Most of the examples in this document make use of the common CSV (comma-separated value) format:

"1","John","Doe"
"2","Jane","Doe"
"3","Matt","Smith"
...

Lines that begin with the comment character (default: #) are skipped:

"ID","FirstName","LastName"
"1","John","Doe"
# this comment will be skipped
"2","Jane","Doe"
"3","Matt","Smith"

If a line ends with an inline comment, the line is parsed up until the comment:

"ID","FirstName","LastName"
"1","John","Doe"
"2","Jane","Doe" # this line is parsed up to this comment
"3","Matt","Smith"

Reading the Input

The input can be passed as the first argument to parse in a few different forms:

`string`

A complete, delimited input.

const input = `
"1","John","Doe"
"2","Jane","Doe"
"3","Matt","Smith"
`

`Readable`

The most common method is reading a file using createReadStream:

import { createReadStream } from 'fs'

const input = createReadStream('/tmp/data.csv')

Options

While TypeDSV implements RFC4180, Parser#parse accepts a variety of options to accomodate data that may not follow that of a typical CSV.

`delimiter`

Type: string
Default: , (comma)

The character that separates values in a row.

# default
"1","John","Doe"

# delimiter: |
"1"|"John"|"Doe"

# delimiter: \t
"1" "John"  "Doe"

`quote`

Type: string
Default: " (double quote)

# default
"1","John","Doe"

# quote: ~
~1~,~John~,~Doe~

Values do not have to be wrapped in quote characters although there are some exceptions as listed below:

"1","John","Doe"          # OK
2,Jane,Doe                # OK
3,"Matt",Smith            # OK

Values that contain a carriage return (default: \r), new line (default: \n), the delimiter (default: ,), or comment (default: #) must be wrapped in the quote character:

# OK
1,John,"Do\re"
2,Jane,"Do\ne"
3,Matt,"Smi,th"
4,Megan,"Smi#th"

# NOT OK
1,John,Do\re
2,Jane,Do\ne
3,Matt,Smi,th
4,Megan,Smi#th

If a quoted value contains the quote character (default: ") then it must be escaped by a preceeding quote character:

1,John,"said ""Hi!"""     # OK
2,Jane,"said "Hi!""       # NOT OK

Non-quoted values can contain the quote character without the escaping:

1,John,said "Hi!"         # OK

`headers`

Type: boolean | string[]
Default: false

This option enables the ability to map properties by the headers instead of by index as described in Mapping Properties with @Parsed.

If the first line of the input declares the headers:

"ID","FirstName","LastName"
"1","John","Doe"

Then they will be parsed and used for mapping if this option is set to true.

Alternatively, a string[] can be used to define headers before parsing the input. In this case, the first line of the input does not need to be header declarations. Of course, the supplied headers will be honored when property mapping:

NOTE If the input does contain header declarations on the first line, it is recommended to use the range option to skip the first line.

"1","John","Doe"
"2","Jane","Doe"
"3","Matt","Smith"
"4","Megan","Smith"

class Person {
  @Parsed('id')
  id: number

  @Parsed('firstName')
  firstName: string

  @Parsed('lastName')
  lastName: string
}

const parser = new Parser(Person)
... = parser.parse(input, { headers: ['id', 'firstName', 'lastName'] })

`mapHeaders`

Type: (headers: string[]) => string[]

Given an input with headers on the first line:

"iD","fIrSTnAMe","LAsTNAme"
"1","John","Doe"

And a class to map using the headers formatted differently:

class Person {
  @Parsed('id')
  id: number

  @Parsed('firstname')
  firstName: string

  @Parsed('lastname')
  lastName: string
}

Then this option can be used to reformat the headers so the values will map correctly:

parser.parse(input, {
  mapHeaders: (headers: string[]) => {
    return headers.map((header: string) => header.toLowerCase())
  }
})

`range`

Type: [number?, number?] | { start?: number, end?: number }
Default: { start: 1 }

Given the following input:

1,"John","Doe"
2,"Jane","Doe"
3,"Matt","Smith"
4,"Megan","Smith"

Setting the start line:

parser.parse(input, { range: [2] }) // array form
parser.parse(input, { range: { start: 2 } }) // object form

[
  ['2', 'Jane', 'Doe'],
  ['3', 'Matt', 'Smith'],
  ['4', 'Megan', 'Smith']
]

Setting the ending line:

NOTE The ending line argument is exclusive.

parser.parse(input, { range: [, 3] }) // array form, remember to include the comma!
parser.parse(input, { range: { end: 3 } }) // object form

[
  ['1', 'John', 'Doe'],
  ['2', 'Jane', 'Doe']
]

Parse a range of lines:

parser.parse(input, { range: [2, 4] }) // array form
parser.parse(input, { range: { start: 2, end: 4 } }) // object form

[
  ['2', 'Jane', 'Doe'],
  ['3', 'Matt', 'Smith']
]

`onObject`

Type: (o: T, line: number) => void

The provided function will be called each time a row has been parsed and mapped:

parser.parse(input, {
  onObject: (row: Example, line: number) => { ... }
})

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Table of Contents

Installation

Getting Started

Mapping Properties with @Parsed

Index

Headers

A Note on Property Types

number

boolean

Options

map

validate

Parser

Promise

async/await

onObject

Input

Reading the Input

string

Readable

Options

delimiter

quote

headers

mapHeaders

range

onObject

`number`

`boolean`

`map`

`validate`

`async`/`await`

`onObject`

`string`

`Readable`

`delimiter`

`quote`

`headers`

`mapHeaders`

`range`

`onObject`