codepoints

v1.3.0

Published

2 years ago

A parser for files in the Unicode database

Downloads

172

0High
0Medium
0Low

devongovett

unicode ucd codepoint

codepoints

A parser for files in the Unicode database. Produces a giant array of codepoint objects for every character represented by Unicode, with many properties derived from files in the Unicode database.

BUILD SCRIPTS ONLY: Use in production is not recommended as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a huge amount of memory. To access this data in real world applications, use modules that have precompiled the data into a compressed form:

unicode-properties

Installation

Install using npm:

npm install codepoints

Usage

Basic usage:

codepoints = require('codepoints');

The parser generates data by reading the text files contained in the Unicode Character Database. By default, it will use the database bundled with this package. To use a custom version of UCD, use codepoints/parser instead, which accepts an optional path to a directory containing the uncompressed UCD data:

parser = require('codepoints/parser');
codepoints = parser('/path/to/UCD');

Codepoint data

Each element in the generated array is either undefined (for unassigned code points), or an object containing the following properties:

code - the code point index
name - character name
unicode1Name - legacy name used by Unicode 1
category - Unicode category
block - the block name this character is a part of
script - the script this character belongs to
eastAsianWidth - the east asian width for this character
combiningClass - numeric combining class value
combiningClassName - a string name for the combining class
bidiClass - class for the Unicode bidirectional algorithm
bidiMirrored - whether the character is mirrored in the bidi algorithm
numeric - the numeric value for this character
uppercase - an array of code points mapping this character to upper case, if any
lowercase - an array of code points mapping this character to lower case, if any
titlecase - an array of code points mapping this character to title case, if any
folded - an array of code points mapping this character to a folded equivalent, if any
caseConditions - conditions used during case mapping for this character
decomposition - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.
compositions - a dictionary mapping of compositions for this character
isCompat - whether the decomposition is a compatibility one
isExcluded - whether the character is excluded from composition
NFC_QC - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)
NFKC_QC - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)
NFD_QC - quickcheck value for NFD (0 = YES, 1 = NO)
NFKD_QC - quickcheck value for NFKD (0 = YES, 1 = NO)
joiningType - arabic joining type
joiningGroup - arabic joining group

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

codepoints

Installation

Usage

Codepoint data

License