junk-parser

v1.0.4

Published

3 years ago

[![Build Status](https://travis-ci.org/justsml/junk-parser.svg?branch=master)](https://travis-ci.org/justsml/junk-parser)

Downloads

0High
0Medium
0Low

justsml

csv parser text delimited

junk-parser - Best-fit, low-RAM CSV Parser

I know, you're thinking "not another CSV/?SV parser!?"

"None of them even work exactly like Excel 20xx anyway!?!?"

Well, this is a different kind of parser. And a fun experiment. #DealWithIt.

It's optimized around a few assumptions - based on observed common errors.

It does localized adjustments to best-fit rows to the column count. It can also adjust columns intelligently based on detected data types.

This technique is biased towards data with more columns & more column types. Even better if the types are in amix Errors in Tuple-, or Key-Value-Pair-shaped files (with only 2-3 columns) will probably not be handled desireably.

Example data - Has column row + 5 rows on 14 lines - IDs 100-104:

id,first,last,addr,job
100,John,Doe,666 Heck Hwy,Cat Herder
101,John,Doe,123 Main St.
Denver CO 80123,Cat Whisperer
102,John,Doe,Attn: Delivery
    123 Main St.
    Denver CO 80123,Cat Whisperer
103,John,Doe,Attn: Delivery
123 Main St., Denver, Co
80122
,Cat Whisperer
104,John,Doe,Attn: Delivery
123 Main St., Denver, Co
80122

,Cat Whisperer

Currently Parses Broken

handles 2-line row, extra line-break
handles 2-line quoted row, xtra line-break
handles 2 extra line-breaks
handles 1 row on 4 lines, w/ "empty" line
handles 2 row, 4 lines quoted w/ trailing delimiter

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

junk-parser

v1.0.4

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

junk-parser - Best-fit, low-RAM CSV Parser

Currently Parses Broken