vcard4-ts
v0.4.1
Published
A vCard v4 parser with type safety first
Downloads
1,935
Maintainers
Readme
vcard4-ts — A vCard 4.0 library with type safety first
vcard4-ts
was designed with the following goals:
Compliant with RFC 6350 and its extensions
TypeScript (and type safety) from the ground up
Avoid mistakes, DRY (Don't Repeat Yourself)
- The data structure definition, created from RFC 6350, contains instructions for the parser
The returned data structure is easy to use
- The decisions to be made by the calling code should be as few and as simple as possible. Everything that can be delegated to the IDE (while writing your code) and TypeScript compile time should be handled there. E.g., no need to check whether there is a single or multiple values: if something can occur multiple times, the item is always in an array.
In addition to RFC6350, the following RFCs are implemented:
- RFC 6474:
BIRTHPLACE
,DEATHPLACE
, andDEATHDATE
properties - RFC 6715:
EXPERTISE
,INTEREST
,HOBBY
,ORG-DIRECTORY
properties andLEVEL
,INDEX
parameters - RFC 8605:
CONTACT-URI
property andCC
(two-letter country code) parameter
vcard4-ts
is compatible to the following RFCs, as it does not impose any
limitation on string-valued parameters and values:
Installation
yarn add vcard4-ts
or npm i vcard4-ts
. No dependencies (except
devDependencies). And only about 10 kB (compressed) will end up in your code,
the rest is tests, alternatives, debugging information, …
Usage
Simple example
Basic usage is straightforward:
import { parseVCards } from 'vcard4-ts';
import { readFileSync } from 'fs';
const vcf = readFileSync('example.vcf').toString();
const cards = parseVCards(vcf);
if (cards.vCards) {
for (const card of cards.vCards) {
console.log('Card found for ' + card.FN[0].value[0]);
}
} else {
console.error('No valid vCards in file');
}
We can see two basic principles in action:
- The types are always clear, no expensive run-time testing whether there is just a single value or there are multiple values. (This is the prime directive.)
- There are no
null
orundefined
(aka nullish) values; and any arrays will always have at least one element. This is the secondary directive.
As a result of these principles, the following rules apply:
- Mandatory properties (
BEGIN
,END
,VERSION
, andFN
) always do exist and are nevernull
orundefined
("nullish"). - Optional properties (all the others defined in RFC 6350) only exist, if they do appear in the file. I.e., if they exist, they also have a value and are never nullish. (However, the strings may still be empty.)
- To match the prime directive, any property, whether mandatory or optional, that may appear more than once, is always an array of values.
These rules make software development more predictable and thus faster, less error-prone:
- Typescript can verify type correctness.
- Autocompletion and type inference in IDEs such as VSCode/VSCodium works and is very helpful.
More elaborate example
This example demonstrates the access to parsing errors and warnings, to structured information, and non-RFC properties. Explanations are in the design and reference sections below.
if (cards.nags) {
// There were global problems, e.g. because the file did seem to contain invalid vCards.
// Those cards can be obtained by passing `keepDefective: true` to `parseVCards()`.
for (const nag of cards.nags) {
if (nag.isError) {
console.error(`${nag.key} (${nag.description}): ${nag.attributes}`);
} else {
console.warn(`${nag.key} (${nag.description}): ${nag.attributes}`);
}
}
}
for (let card of cards.vCards) {
// If you would like element 0 to correspond to the most PREFerred item:
sortByPREF(card);
// You're guaranteed to have all these (required) properties,
// no need to check their existence first. Also, the editor will
// auto-complete and know the type.
console.log('Found vCard with version ' + card.VERSION.value);
console.log('Full name: ' + card.FN[0].value[0]);
// Maybe some optional (any-cardinality) RFC6350 property is present?
if (card.EMAIL) {
// There might be multiple EMAIL property lines, but as the EMAIL field
// is present, we're guaranteed to have at least one value. See
// https://netfuture.ch/2021/11/array-thickening-more-can-be-less/
console.log('Emailable at: ' + card.EMAIL[0].value);
// Is it known whether it is a work or home address?
if (card.EMAIL[0].parameters?.TYPE) {
console.log('It is of type: ' + card.EMAIL[0].parameters.TYPE[0]);
}
}
// The same with a structured any-cardinality property
if (card.ADR) {
// All elements of the address, including the locality, can have multiple
// values. And we still could have multiple addresses (e.g., work and
// home). We'll just print the first.
console.log('Living in: ' + card.ADR[0].value.locality[0]);
}
// Any property not in the standard (and its extension RFCs)?
// (Their name should be prefixed with `X-`)
if (card.x) {
for (const [k, v] of Object.entries(card.x)) {
console.log('Non-RFC6350 property ' + k + ', with ' + JSON.stringify(v));
}
}
// Any problems found while parsing the vCard?
if (card.nags) {
console.log(
'While parsing this card, the following was noticed ' +
'(and either the problematic part dropped or ignored)',
);
for (const nag of card.nags) {
if (nag.isError) {
console.error(`Global ${nag.key} (${nag.description})`);
} else {
console.warn(`Global ${nag.key} (${nag.description})`);
}
}
// Some of these problems might be unparseable lines. They are archived
// here.
if (card.unparseable) {
console.log('The following unparseable lines were encountered:');
for (const line of card.unparseable) {
console.log(line);
}
}
}
}
Design
The prime design goal is to avoid mistakes in the code and enable calling code to avoid mistakes as well. Designing for (type) safety is achieved by Don't Repeat Yourself, Parse, don't validate, and Array thickening.
DRY
Don't Repeat Yourself was a basic design principle while developing the module. The description of the data structure is centralized. The goal was to have only a single authoritative source of type information, from which both compile-time type information and runtime parsing instructions would be derived. As TypeScript transpilation output no longer contains the type information, it was necessary to jump through hoops. (Luckily, Colin McDonnell's Zod was a great resource for educating about hoop-jumping.)
Parse, don't validate
The idea of parsing instead of validation was introduced by Alexis King, for the Haskell ecosystem. The gist of it: Directly parse the source data into the required (type-safe) format, instead of first parsing it into an (essentially) untyped format and then validating it to be of the right type. This assures that type safety starts earlier and is guaranteed to be consistent throughout the entire codebase.
In vcard4-ts
, data structures are created and filled type-safe from the start.
Because properties will be added on a line-by-line basis, required properties
cannot be ensured to exist from the start. Therefore, as an exception to this
rule, the existance of required fields is only ensured at the end.
Array thickening
The advantage of always having an array IMHO greatly outweighs the
disadvantages. Calling code can always assume that the contents are an array.
I.e., arrays with just a single value are never flattened (therefore the
name). If you are only interested in one value, just use the one at index 0,
which will always exist. If you want to deal with multiple values, use array
methods such as map()
and join()
, which you can always use, because it is
always an array. Yes, this results in more time and space spent during the
creation of the data structure.
More importantly, this relieves calling code from performing case distinctions on every single access. Instead, the existence of the property can be asserted once and every reference to it later already knows how to deal with it. It is even possible to combine assertion and access with optional chaining.
Array thickening results in less code for the caller, which often also results in less code coverage, i.e., the uncommon case is not tested. In other words, array thickening turns the general case (whether common or uncommon) into the only case.
API
parseVCards(vcf: string, keepDefective?: boolean = false): ParsedVCards
: Parse a string into possibly multiple VCards. Details below.sortByPREF<T extends Partial<VCard4>>(vcard: T)
: Sort properties which exist multiple times by their preference parameter (1…100; the ones withoutPREF
are sorted last).groupVCard<T extends Partial<VCard4>>(vcard: T): GroupedVCard
: Group properties with group labels into their named group (all non-lowercase names). Anything without an explicit group label will end up in thetop
. (GroupedVCard
isRecord<Uppercase<string> | 'top', Partial<VCard4>>
).
Sorting and grouping are separate functions, not methods of an object, to ensure that their code will only be included if you call them.
If you need sorting and grouping, use the following sequence:
const cards = parseVCards(vcf);
if (cards.vCards) {
for (const card of cards.vCards) {
sortByPREF(card);
const grouped = groupVCard(card);
// Process the PREF-sorted groups here
}
}
Reference
Property/parameter names
All vCard properties and parameters in the data structures are uppercase and
dashes have been converted to underscores. This makes them clearly visible and
easily accessible as JavaScript/TypeScript properties, avoiding the
harder-to-type hash/array notation (i.e., card.SORT_AS
instead of
card['SORT-AS']
).
Lowercase JavaScript/TypeScript properties are maintained by the parser.
Property cardinality
BEGIN
,END
, andVERSION
exist exactly once (cardinality1
in RFC6350; required value in TypeScript)FN
(full name) exists at least once (1*
in RFC6350; optional array in TypeScript)PRODID
,UID
,REV
,KIND
,N
(name),BDAY
,BIRTHPLACE
,DEATHDATE
,DEATHPLACE
,ANNIVERSARY
, andGENDER
are optional (*1
in RFC6350; optional value in TypeScript)- All others can occur any number of times (
*
in RFC6350; optional array in TypeScript)
Property value type
N
is an object with the following properties:familyNames
,givenNames
,additionalNames
,honorificPrefixes
,honorificSuffixes
; each a requiredstring[]
. Remember that arrays are guaranteed to always have at least one element, i.e., the an emptyhonorificPrefixes
property will be encoded as an array consisting of an empty string['']
.ADR
is similar toN
, but with the following string array fields:postOfficeBox
,extendedAddress
,streetAddress
,locality
(city),region
,postalCode
, andcountryName
.GENDER
consists of twostring
s, a requiredsex
and an optional explanatorytext
.sex
is required by RFC6350 to be one ofM
,F
,O
,N
,U
, or the empty string. However, this is not checked byvcard4-ts
.CLIENTPIDMAP
consists ofpidRef
, anumber
, and auri
, astring
.- All other properties' values are mapped to a single
string
, even if they are defined as more structured types, such as dates or URIs.
Property parameters
Properties can have (mostly optional) parameters:
PREF
is anumber
. It is not asserted whether it is in the range [1…100] required by the RFC; non-numeric values are returned asNaN
.INDEX
is anumber
. It is not asserted whether it is a strictly positive integer as mandated by RFC6715; non-numeric values are returned asNaN
.PID
,TYPE
, andSORT_AS
(SORT-AS
in the VCF) arestring[]
s, again with a guaranteed minimum array length of 1. (Please note that the example in the RFC quotes the enumeration ofTYPE
s, which seems inconsistent with theTYPE
definition, so you may want to applysplit(',')
to allTYPE
values first.)- All others are single
string
s.
Non-RFC properties and parameters
Any property or parameter whose type is not explicitely given in RFC6350 and the
RFCs that extend it, including those prefixed by X-
, are not included at the
same level as the rest of the properties. One reason is that
TypeScript does not really allow default types on object properties
and therefore,
nested index signatures
are recommended for this.
Instead, non-RFC properties and parameters are put into an x
object property.
The actual value will be a plain, unprocessed string
. If it has more
structure, you need to extract it yourselves, e.g. using
scan1DValue()
, which unescapes and splits at the specifiedsplitChar
(,
, as used forPID
orTYPE
parameters; or;
, as used for theGENDER
value); orscan2DValue()
, which splits into astring[][]
at;
and,
(used forADR
andN
values).
For example, the string
value of an X-ABUID
property in card card
would be
available as card.x.X_ABUID.value
.
Handling errors
Your application can just ignore the errors, if it does not want to bother.
One of the design goals so obvious that it was not specifically mentioned above,
is that vcard4-ts
should be as easy to use as possible. Anyone who ever had to
deal with user-specified input can tell horror stories about what can go wrong.
Last but not least, ensuring
user-specified input fulfills certain requirements is also a matter of security.
Therefore, parseVCards()
returns the information in a format as consistent as
possible, minimizing doubt and variability. In general, any line that cannot be
parsed is ignored, and any vCard which does not fulfill minimum criteria is
discarded.
This process is documented in the nags
property of the returned object(s). The
nags
property is an array of warnings and errors that occurred during the
processing.
Warnings and errors
A warning indicates that even though the input does not fulfill an RFC6350 criteria, the parser believes that it could safely correct the problem and that the data returned is probably exactly what its originator meant it to be.
An error, on the other hand, indicates that some information was dropped, or, alternatively, that some required information was added. The resulting parsed data is not the same as originally provided, but it is the best the parser could do to achieve RFC6350 conformance.
If at least one actual error (not just warnings) is included in the nags,
hasErrors
is set to true
. Depending on the policy of the calling code,
- data can be accepted as returned by the parser (most lenient),
- data can be refused if
hasErrors
istrue
(it always exists, but hopefully isfalse
), or - data can be refused if
nags
exists (i.e., any errors or warnings occured; the most strict policy).
Global, local, and mixed nags
Local nags are specific to a vCard and are stored there, alongside the properties.
Local nags have the following type:
{
key: string; // A short string to match against in the code
description: string; // A longer english-language description to display to the user
isError: boolean; // Error or warning?
attributes: {
property: string; // The property it occurred at (or '', if there was a property name parsing problem)
parameter?: string; // If the problem occurred while parsing a parameter, this is its name
line?: string; // The first few characters of the line on which this error occurred
}
}
Global nags are set at the top level of the returned structure, alongside the
vCards
field, if it exists. They indicate problems not related to a vCard, or
related to a vCard which was not included because it was considered too bad to
be returned.
Global nags use the same type as local nags above, but without the attributes
.
Mixed nags are used to indicate errors affecting an entire vCard (there are no
mixed warnings). If parseVCards()
detects a major problem with a vCard
(VCARD_BAD_TYPE
or VCARD_NOT_BEGIN
), then—by default—this vCard is dropped
and the error—unable to be stored in the vCard itself—is bubbled up to the
global level. However, if keepDefective=true
is passed as an optional
argument, these vCards are not dropped and the error is stored in the vCard
itself.
The nags
FILE_EMPTY
: A global error.FILE_CRLF
: A global warning, that lines did not end in carriage return+line feed as specified in RFC6350, but just with line feeds. (This only checks the first line end and is therefore subject to false negatives, if line ends are not consistent.)VCARD_BAD_TYPE
: A mixed error resulting in a defective card. TheBEGIN
orEND
property does not have the requiredVCARD
value.VCARD_NOT_BEGIN
: A mixed error resulting in a defective card. The first property of the vCard is not aBEGIN
property.VCARD_MISSING_PROP
: A local error. A required property is missing and has been added with a default value. The default forVERSION
is4.0
; forFN
, the empty string.PROP_NAME_EMPTY
: A local error. The property has an empty name.PROP_NAME_EOL
: A local error. The property name is terminated by the end of line, i.e., colon and value are missing.PROP_DUPLICATE
: A local error. property which may not appear more than once has been seen a second time.PARAM_UNCLOSED_QUOTE
: A local error. A parameter had a quoted value, but the quote was unbalanced.PARAM_MISSING_EQUALS
: A local error. A parameter name was not terminated by an equals sign.PARAM_INVALID_NUMBER
: A local error. The parameter value should have been a number but wasn't.PARAM_DUPLICATE
: A local error. A parameter that can only have a single value was specified more than once.PARAM_UNESCAPED_COMMA
: A local warning. A parameter accepting only a single value contained an unescaped comma. This may indicate incomplete character escaping or trying to provide multiple values where they are not allowed.PARAM_BAD_BACKSLASH
: A local warning. In a double-quoted parameter value, a backslash was found. Escaping in quoted parameter values should be according to RFC6868, using circumflexes (^
). This indicates a possible problem in the input file; the backslash was not treated as a special character.PARAM_BAD_CIRCUMFLEX
: A local warning. In a double-quoted parameter value, a circumflex (^
) was found, which was not part of an escape sequence. This indicates a possible problem in the input file; that circumflex was not treated as a special character.VALUE_INVALID
: A local error. A property with a required value had a different value.VALUE_UNESCAPED_COMMA
: A local warning. A property accepting only a single value contained an unescaped comma. This may indicate old-style (vCard3) value, e.g. forPHOTO
, which is considered incomplete character escaping in vCard4.
Unparseable lines
If any lines in the current vCard left the parser speechless, they are stored
essentially unmodified in the unparseable
array. The only modification is that
wrapped lines have been unwrapped, as this happens before parsing. You most
likely want to ignore those lines, unless you want to re-export the vCard as
faithfully as possible, even if that violates the standard (and might cause
errors for other parsers).
Related work
Searching for
vcard
on NPM results in mostly vCard generators or converters to/from other formats. Notable exceptions:- vcard4 is a vCard 4.0 generator which
also includes parsing capabilities.
Trying to create type annotations forvcard4
turned out to be hard. The resulting types for the parser would be so lax as not to help when writing a program processing it further, requiring runtime type verification in the application. Also, their design decision to transform arrays with a single member into requires every access to verify the field's structure. Furthermore, it has some minor issues with its RFC 6350 compliance (lack of proper property group support or incomplete unescaping rules) and the IETF's general Robustness principle (i.e., not accepting bare newlines). - vdata-parser is a generic
vCard/vCalendar parser, handling multiple cards in a single file.
Similar tovcard4
above, it does not seem amenable to reasonably tight types and mixes elements and arrays. Furthermore, it is unaware of the expected parameter/property structure and does not handle escaped data.
- vcard4 is a vCard 4.0 generator which
also includes parsing capabilities.
The runtime type introspection required for DRY is modeled after Zod.
Zod was even used for an early prototype. However, a ultra-lightweight, tailored alternative to Zod was created (clocking in at under 200 bytes minified/gzipped). Zod would have created overhead (additional dependencies, bundle size, but especially the amount of code needed to define and query the schema, while having to touch Zod internals which might change in the future), while providing little benefit. For example, Zod'stransform
seemed to be impossible to apply to parsing directly. So, Zod's would just have been used to duplicate work that had already been performed