bidi-js
v1.0.3
Published
A JavaScript implementation of the Unicode Bidirectional Algorithm
Downloads
3,867,549
Readme
bidi-js
This is a pure JavaScript implementation of the Unicode Bidirectional Algorithm version 13.0.0. Its goals, in no particular order, are to be:
- Correct
- Small
- Fast
Conformance
This implementation currently conforms to section UAX-C1 of the bidi spec, as verified by running all the provided conformance tests.
Compatibility
It has no external dependencies and therefore should run just fine in any relatively capable web browser, Node.js, etc. The provided distribution .js
files are valid ES5.
Usage
Install it from npm:
npm install bidi-js
Import and initialize:
import bidiFactory from 'bidi-js'
// or: const bidiFactory = require('bidi-js')
const bidi = bidiFactory()
The bidi-js
package's only export is a factory function which you must invoke to return a bidi
object; that object exposes the methods for bidi processing.
(Why a factory function? The main reason is to ensure the entire module's code is wrapped within a single self-contained function with no closure dependencies. This enables that function to be stringified and passed into a web worker, for example.)
Now that you have the bidi
object, you can:
Calculate bidi embedding levels
const embeddingLevels = bidi.getEmbeddingLevels(
text, //the input string containing mixed-direction text
explicitDirection //"ltr" or "rtl" if you don't want to auto-detect it
)
const { levels, paragraphs } = embeddingLevels
The result object embeddingLevels
will usually be passed to other functions described below. Its contents, should you need to inspect them individually, are:
levels
is aUint8Array
holding the calculated bidi embedding levels for each character in the string. The most important thing to know about these levels is that any given character is in a right-to-left scope if its embedding level is an odd number, and left-to-right if it's an even number.paragraphs
is an array of{start, end, level}
objects, one for each paragraph in the text (paragraphs are separated by explicit breaking characters, not soft line wrapping). Thestart
andend
indices are inclusive, andlevel
is the resolved base embedding level of that paragraph.
Calculate character reorderings
const flips = bidi.getReorderSegments(
text, //the full input string
embeddingLevels //the full result object from getEmbeddingLevels
)
// Process all reversal sequences, in order:
flips.forEach(range => {
const [start, end] = range
// Reverse this sequence of characters from start to end, inclusive
for (let i = start; i <= end; i++) {
//...
}
})
Each "flip" is a range that should be reversed in place; they must all be applied in order.
Sometimes you don't want to process the whole string at once, but just a particular substring. A common example would be if you've applied line wrapping, in which case you need to process each line individually (in particular this does some special handling for trailing whitespace for each line). For this you can pass the extra start
and end
parameters:
yourWrappedLines.forEach(([lineStart, lineEnd]) => {
const flips = bidi.getReorderSegments(
text,
embeddingLevels,
lineStart,
lineEnd //inclusive
)
// ...process flips for this line
})
Handle right-to-left mirrored characters
Some characters that resolve to right-to-left need to be swapped with their "mirrored" characters. Examples of this are opening/closing parentheses. You can determine all the characters that need to be mirrored like so:
const mirrored = bidi.getMirroredCharactersMap(
text,
embeddingLevels
)
This returns a Map
of numeric character indices to replacement characters.
You can also process just a substring with extra start
and end
parameters:
const mirrored = bidi.getMirroredCharactersMap(
text,
embeddingLevels,
start,
end //inclusive
)
If you'd rather process mirrored characters individually, you can use the single getMirroredCharacter
function, just make sure you only do it for right-to-left characters (those whose embedding level is an odd number.) It will return null
if the character doesn't support mirroring.
const mirroredChar = (embeddingLevels.levels[charIndex] & 1) //odd number means RTL
? bidi.getMirroredCharacter(text[charIndex])
: null
Get a character's bidi type
This is used internally, but you can also ask for the "bidi character type" of any character, should you need it:
const bidiType = bidi.getBidiCharTypeName(string[charIndex])
// e.g. "L", "R", "AL", "NSM", ...