@pdf-lib/unicode-properties
v0.0.1
Published
Provides fast access to unicode character properties
Downloads
79
Maintainers
Readme
NOTE: All credit for this code belongs to the developers of https://github.com/devongovett/unicode-properties
Purpose of this Fork
This fork was created for use in https://github.com/Hopding/pdf-lib.
The original repository serialized and loaded the trie data using a binary file. This worked fine in Node, because fs.readFileSync
was being called to load the serialized data into a Buffer
object. In order to support use in the browser, Browserify and brfs
were used to inline the binary data in the index.js
file, and thereby remove the call to fs.readFileSync
.
This works fine if you are in a Node environment, or are using Browserify to bundle your code for use in the browser. But it doesn't work so well if you aren't doing either of those things. E.g. I was writing an app built with create-react-app
, and the binary data was not being inlined for this dependency.
I resolved this by simply serializing the trie data to a JSON file, which allows it to be loaded into the index.js
file without using Browserify. Of course, this means that the trie data is not stored as efficiently, but that is not a concern for me.
unicode-properties
Provides fast access to unicode character properties. Uses unicode-trie to compress the properties for all code points into just 12KB.
Usage
npm install unicode-properties
var unicode = require('unicode-properties');
unicode.getCategory('2'.charCodeAt()) //=> 'Nd'
unicode.getNumericValue('2'.charCodeAt()) //=> 2
API
getCategory(codePoint)
Returns the unicode general category for the given code point.
getScript(codePoint)
Returns the script for the given code point.
getCombiningClass(codePoint)
Returns the canonical combining class for the given code point.
getEastAsianWidth(codePoint)
Returns the East Asian width for the given code point.
getNumericValue(codePoint)
Returns the numeric value for the given code point, or null if there is no numeric value for that code point.
isAlphabetic(codePoint)
Returns whether the code point is an alphabetic character.
isDigit(codePoint)
Returns whether the code point is a digit.
isPunctuation(codePoint)
Returns whether the code point is a punctuation character.
isLowerCase(codePoint)
Returns whether the code point is lower case.
isUpperCase(codePoint)
Returns whether the code point is upper case.
isTitleCase(codePoint)
Returns whether the code point is title case.
isWhiteSpace(codePoint)
Returns whether the code point is whitespace: specifically, whether the category is one of Zs, Zl, or Zp.
isBaseForm(codePoint)
Returns whether the code point is a base form. A code point of base form does not graphically combine with preceding characters.
isMark(codePoint)
Returns whether the code point is a mark character (e.g. accent).
License
MIT