gedcom555-token
v555.0.2
Published
Tokenizer for Gedcom 5.5.5
Downloads
9
Readme
Gedcom 5.5.5 Token
A tokenizer for 'Gedcom 5.5.5'.
Install
npm i gedcom555-token;
Usage
import {tokenizeFromString} from "gedcom555-token";
const tokenized = tokenizeFromString(`0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR gedcom.org
0 @U@ SUBM
1 NAME gedcom.org
0 TRLR`);
/*
[
{
level: 0,
tag: `HEAD`,
},
{
level: 1,
tag: `GEDC`,
},
{
level: 2,
tag: `VERS`,
lineItem: `5.5.5`,
},
{
level: 2,
tag: `FORM`,
lineItem: `LINEAGE-LINKED`,
},
{
level: 3,
tag: `VERS`,
lineItem: `5.5.5`,
},
{
level: 1,
tag: `CHAR`,
lineItem: `UTF-8`,
},
{
level: 1,
tag: `SOUR`,
lineItem: `gedcom.org`,
},
{
level: 0,
tag: `SUBM`,
xrefId: `@U@`,
},
{
level: 1,
tag: `NAME`,
lineItem: `gedcom.org`,
},
{
level: 0,
tag: `TRLR`,
},
]
*/
Line by line:
When required, the tokenizer can be called for a single line.
import {tokenize} from "gedcom555-token/dist/token";
const tokenized = tokenize(`0 head`);
/*
{
level: 0,
tag: `HEAD`
}
*/
Notes
- Does not check encoding. Assuming that the string is unicode.
- Checks for line terminator consistency.
- Checks tags against known list. Todo: Low Priority: Allow tag list extension.
- Checks line item form single "@" at signs.
- Does not check other grammar rules. These are left for the parser to implement.
- Gedcom 555 tags being case insensitive, tokenize converts them to upper case.
License
MIT
Issues / FAQ
- Empty CONT. As per the gedcom line definition, a CONT tag can appear without line value. If so, the line terminator MUST be directly after the tag. A trailing space or deliminator after the tag and before the terminator will cause an error.
"2 CONT" : is legal : +1 CONT[terminator]
"2 CONT " : is illegal : +1 CONT[delim space][terminator]
"2 CONT " : is legal : +1 CONT[delim space][line value space][terminator]