tibetan-ewts-converter

v2.0.0

Published

a month ago

Tibetan transliteration (Wylie, EWTS) and approximate phonetics

0High
0Medium
0Low

rogerespel

tibetan wylie ewts unicode transliteration converter phonetics

Tibetan Phonetics and Transliteration

This JavaScript package implements two things:

conversion between Unicode Tibetan text and Extended Wylie transliteration (EWTS)
approximate Tibetan phonetics according to THL and other systems.

Installation

npm install tibetan-ewts-converter

As of version 2, this is a pure ES module.

Usage

Wylie/EWTS conversion:

import { EwtsConverter } from 'tibetan-ewts-converter/EwtsConverter';
const ewts = new EwtsConverter();
console.log(ewts.to_unicode("sangs rgyas"));
console.log(ewts.to_ewts("སངས་རྒྱས"));

Approximate phonetics:

import { get_phonetics } from 'tibetan-ewts-converter';
const pho = get_phonetics({ style: "lotsawahouse", lang: "en" });
console.log(pho.phonetics("sangs rgyas", { autosplit: true }));

EwtsConverter options

The constructor accepts an optional object with named options:

check: generate warnings for illegal consonant sequences and the like; default is true.
check_strict: stricter checking, examine the whole stack; default is true.
fix_spacing: remove spaces after newlines, collapse multiple tseks into one, fix case, etc; default is true.
sloppy: silently fix a number of common Wylie mistakes when converting to Unicode; default is false
leave_dubious: when converting to Unicode, leave dubious syllables unprocessed, between [brackets], instead of doing a best attempt; default is false
pass_through: when converting to EWTS, pass through non-Tib characters instead of converting to [comments]; default is false

let ewts = new EwtsConverter({ check_strict: false, leave_dubious: true, sloppy: true });

TibetanPhonetics options

get_phonetics accepts an optional object with named options:

style: one of 'thl', 'lotsawahouse', 'rigpa', 'lhasey', 'padmakara'
lang: 2-letter language code, for styles that have language variants (ex. 'en', 'es')

The phonetics method takes a string (Tibetan Unicode or EWTS), and an optional options object.

Unless you're using a better external tokenizer, always pass the option { autosplit: true }.

See the code for lots of other options allowing fine control of phonetics generation. You can also directly import and use the classes TibetanPhonetics, TibetanPhoneticsRigpa, TibetanPhoneticsLhasey and TibetanPhoneticsPadmakara.

Code and history

The first version of this code was written in Perl around 2008. In 2010 the EWTS/Unicode converter was ported to Java at the request of TBRC, now BDRC.

The Java code for phonetics was then ported to other languages by different groups:

Python port by Esukhia
C# port by radiantspace
Another Python port by radiantspace
JavaScript ports from BDRC, Ksana Forge and Karmapa Digital Toolbox
This Javascript port of 2021, going back to the original Perl code, but incorporating some of the improvements done by various groups.

Phonetics generation was added to this project in 2025, also ported from the original perl with the help of AI.

License

Apache 2.0.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme