codepoints
v1.3.0
Published
A parser for files in the Unicode database
Downloads
172
Readme
codepoints
A parser for files in the Unicode database. Produces a giant array of codepoint objects for every character represented by Unicode, with many properties derived from files in the Unicode database.
BUILD SCRIPTS ONLY: Use in production is not recommended as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a huge amount of memory. To access this data in real world applications, use modules that have precompiled the data into a compressed form:
Installation
Install using npm:
npm install codepoints
Usage
Basic usage:
codepoints = require('codepoints');
The parser generates data by reading the text files contained in the
Unicode Character Database. By default, it will use the database
bundled with this package. To use a custom version of UCD, use codepoints/parser
instead,
which accepts an optional path to a directory containing the uncompressed UCD data:
parser = require('codepoints/parser');
codepoints = parser('/path/to/UCD');
Codepoint data
Each element in the generated array is either undefined
(for unassigned code
points), or an object containing the following properties:
code
- the code point indexname
- character nameunicode1Name
- legacy name used by Unicode 1category
- Unicode categoryblock
- the block name this character is a part ofscript
- the script this character belongs toeastAsianWidth
- the east asian width for this charactercombiningClass
- numeric combining class valuecombiningClassName
- a string name for the combining classbidiClass
- class for the Unicode bidirectional algorithmbidiMirrored
- whether the character is mirrored in the bidi algorithmnumeric
- the numeric value for this characteruppercase
- an array of code points mapping this character to upper case, if anylowercase
- an array of code points mapping this character to lower case, if anytitlecase
- an array of code points mapping this character to title case, if anyfolded
- an array of code points mapping this character to a folded equivalent, if anycaseConditions
- conditions used during case mapping for this characterdecomposition
- an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.compositions
- a dictionary mapping of compositions for this characterisCompat
- whether the decomposition is a compatibility oneisExcluded
- whether the character is excluded from compositionNFC_QC
- quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)NFKC_QC
- quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)NFD_QC
- quickcheck value for NFD (0 = YES, 1 = NO)NFKD_QC
- quickcheck value for NFKD (0 = YES, 1 = NO)joiningType
- arabic joining typejoiningGroup
- arabic joining group
License
MIT