my-diacritic-sort
v1.0.1
Published
Adjust Myanmar text to proper diacritic order
Downloads
6
Maintainers
Readme
my-diacritic-sort
I received some PDFs using Myanmar Unicode characters, but also empty codepoints representing different Myanmar diacritics, and other combination characters. Converting to full Unicode order is painstaking, and we need it in several applications, so I am putting it into a module.
Sample text
We received a PDF where the name "Mohnyin Township" appears like this:
မိုးညှင်းမြို့နယ်အတွင်းရှိ
but when you copy and paste the actual characters, you get this:
မိးညင်းမိုန့ယ်အတွင်းရှိ
Here are its issues:
The first character မိုး is missing the ု because an empty codepoint is used. This separates out the next diacritic း
မြို့န is written မိုန့ - the ြ diacritic is an empty codepoint that is placed before the character that it modifies. The ့ diacritic is placed after the character န instead of the character that it modifies.
In other text samples, there are multiple diacritics in a nonstandard order.
On the web
Include the my-diacritic.js file. Then pass it some text:
sortDiacritics("မိးညင်းမိုန့ယ်အတွင်းရှိ");
> "မိုးညှင်းမြို့နယ်အတွင်းရှိ"
It doesn't convert back. wontfix.
In NodeJS
npm install my-diacritic-sort
var sortDiacritics = require("my-diacritic-sort");
sortDiacritics("မိးညင်းမိုန့ယ်အတွင်းရှိ");
License
Open source under MIT license