npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

to-jyutping

v3.1.1

Published

粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool

Downloads

282

Readme

to-jyutping

npm types license

粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool

Installation

npm install to-jyutping

Via CDN

<script src="https://unpkg.com/[email protected]" defer></script>

In Other Languages

Usage

For the 8 basic functions, examples are worth a thousand words:

import ToJyutping from "to-jyutping";

> ToJyutping.getJyutpingList("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
[["咁", "gam3"], ["啱", "ngaam1"], ["老", "lou5"], ["世", "sai3"], ["要", "jiu1"], ["求", "kau4"], ["佢", "keoi5"], ["等", "dang2"], ["陣", "zan6"], ["要", "jiu3"], ["開", "hoi1"], ["會", "wui2"], [",", null], ["剩", "zing6"], ["低", "dai1"], ["嘅", "ge3"], ["嘢", "je5"], ["我", "ngo5"], ["會", "wui5"], ["搞", "gaau2"], ["掂", "dim6"], ["㗎", "gaa3"], ["喇", "laa3"], ["。", null]]

> ToJyutping.getJyutping("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
"咁(gam3)啱(ngaam1)老(lou5)世(sai3)要(jiu1)求(kau4)佢(keoi5)等(dang2)陣(zan6)要(jiu3)開(hoi1)會(wui2),剩(zing6)低(dai1)嘅(ge3)嘢(je5)我(ngo5)會(wui5)搞(gaau2)掂(dim6)㗎(gaa3)喇(laa3)。"

> ToJyutping.getJyutpingText("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
"gam3 ngaam1 lou5 sai3 jiu1 kau4 keoi5 dang2 zan6 jiu3 hoi1 wui2 zing6 dai1 ge3 je5 ngo5 wui5 gaau2 dim6 gaa3 laa3"

> ToJyutping.getJyutpingCandidates("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
[["咁", ["gam3", "gam2", "gam1", "gam4"]], ["啱", ["ngaam1", "aam1", "am1", "ngam1"]], ["老", ["lou5", "lou2"]], ["世", ["sai3", "sai2"]], ["要", ["jiu1", "jiu3", "jiu2"]], ["求", ["kau4"]], ["佢", ["keoi5", "heoi5"]], ["等", ["dang2"]], ["陣", ["zan6", "zan2"]], ["要", ["jiu3", "jiu2", "jiu1"]], ["開", ["hoi1"]], ["會", ["wui2", "wui5", "wui6", "wui3", "kui2", "kui3", "kwui2"]], [",", []], ["剩", ["zing6", "sing6"]], ["低", ["dai1"]], ["嘅", ["ge3", "ge2", "koi2", "koi3"]], ["嘢", ["je5", "e5"]], ["我", ["ngo5", "o5"]], ["會", ["wui5", "wui6", "wui2", "wui3", "kui2", "kui3", "kwui2"]], ["搞", ["gaau2"]], ["掂", ["dim6", "dim3", "dim1"]], ["㗎", ["gaa3", "ga3", "gaa2", "gaa1", "gaa4"]], ["喇", ["laa3", "laa1", "laak3", "laa5", "laat3"]], ["。", []]]

> ToJyutping.getIPAList("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
[["咁", "kɐm˧"], ["啱", "ŋaːm˥"], ["老", "lou̯˩˧"], ["世", "sɐi̯˧"], ["要", "jiːu̯˥"], ["求", "kʰɐu̯˨˩"], ["佢", "kʰɵy̑˩˧"], ["等", "tɐŋ˧˥"], ["陣", "t͡sɐn˨"], ["要", "jiːu̯˧"], ["開", "hɔːi̯˥"], ["會", "wuːi̯˧˥"], [",", null], ["剩", "t͡seŋ˨"], ["低", "tɐi̯˥"], ["嘅", "kɛː˧"], ["嘢", "jɛː˩˧"], ["我", "ŋɔː˩˧"], ["會", "wuːi̯˩˧"], ["搞", "kaːu̯˧˥"], ["掂", "tiːm˨"], ["㗎", "kaː˧"], ["喇", "laː˧"], ["。", null]]

> ToJyutping.getIPA("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
"咁[kɐm˧]啱[ŋaːm˥]老[lou̯˩˧]世[sɐi̯˧]要[jiːu̯˥]求[kʰɐu̯˨˩]佢[kʰɵy̑˩˧]等[tɐŋ˧˥]陣[t͡sɐn˨]要[jiːu̯˧]開[hɔːi̯˥]會[wuːi̯˧˥],剩[t͡seŋ˨]低[tɐi̯˥]嘅[kɛː˧]嘢[jɛː˩˧]我[ŋɔː˩˧]會[wuːi̯˩˧]搞[kaːu̯˧˥]掂[tiːm˨]㗎[kaː˧]喇[laː˧]。"

> ToJyutping.getIPAText("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
"kɐm˧.ŋaːm˥.lou̯˩˧.sɐi̯˧.jiːu̯˥.kʰɐu̯˨˩.kʰɵy̑˩˧.tɐŋ˧˥.t͡sɐn˨.jiːu̯˧.hɔːi̯˥.wuːi̯˧˥.t͡seŋ˨.tɐi̯˥.kɛː˧.jɛː˩˧.ŋɔː˩˧.wuːi̯˩˧.kaːu̯˧˥.tiːm˨.kaː˧.laː˧"

> ToJyutping.getIPACandidates("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
[["咁", ["kɐm˧", "kɐm˧˥", "kɐm˥", "kɐm˨˩"]], ["啱", ["ŋaːm˥", "aːm˥", "ɐm˥", "ŋɐm˥"]], ["老", ["lou̯˩˧", "lou̯˧˥"]], ["世", ["sɐi̯˧", "sɐi̯˧˥"]], ["要", ["jiːu̯˥", "jiːu̯˧", "jiːu̯˧˥"]], ["求", ["kʰɐu̯˨˩"]], ["佢", ["kʰɵy̑˩˧", "hɵy̑˩˧"]], ["等", ["tɐŋ˧˥"]], ["陣", ["t͡sɐn˨", "t͡sɐn˧˥"]], ["要", ["jiːu̯˧", "jiːu̯˧˥", "jiːu̯˥"]], ["開", ["hɔːi̯˥"]], ["會", ["wuːi̯˧˥", "wuːi̯˩˧", "wuːi̯˨", "wuːi̯˧", "kʰuːi̯˧˥", "kʰuːi̯˧", "kʷʰuːi̯˧˥"]], [",", []], ["剩", ["t͡seŋ˨", "seŋ˨"]], ["低", ["tɐi̯˥"]], ["嘅", ["kɛː˧", "kɛː˧˥", "kʰɔːi̯˧˥", "kʰɔːi̯˧"]], ["嘢", ["jɛː˩˧", "ɛː˩˧"]], ["我", ["ŋɔː˩˧", "ɔː˩˧"]], ["會", ["wuːi̯˩˧", "wuːi̯˨", "wuːi̯˧˥", "wuːi̯˧", "kʰuːi̯˧˥", "kʰuːi̯˧", "kʷʰuːi̯˧˥"]], ["搞", ["kaːu̯˧˥"]], ["掂", ["tiːm˨", "tiːm˧", "tiːm˥"]], ["㗎", ["kaː˧", "kɐ˧", "kaː˧˥", "kaː˥", "kaː˨˩"]], ["喇", ["laː˧", "laː˥", "laːk̚˧", "laː˩˧", "laːt̚˧"]], ["。", []]]

For getJyutpingCandidates and getIPACandidates, pronunciations are sorted according to how likely they are to be correct in a sentence, with the first being the most likely.

Methods may also be imported individually:

> import { getJyutpingList } from "to-jyutping";
> getJyutpingList("咁啱老世要求佢等陣要開會,剩低嘅嘢我會搞掂㗎喇。");
"gam3 ngaam1 lou5 sai3 jiu1 kau4 keoi5 dang2 zan6 jiu3 hoi1 wui2, zing6 dai1 ge3 je5 ngo5 wui5 gaau2 dim6 gaa3 laa3."

In rare cases, the pronunciation of a single character can contain more than one syllable:

> ToJyutping.getJyutpingList("一瓩");
[["一", "jat1"], ["瓩", "cin1 ngaa5"]]
> ToJyutping.getIPAList("一瓩");
[["一", "jɐt̚˥"], ["瓩", "t͡sʰiːn˥.ŋaː˩˧"]]

They are mostly dated ligature characters (合字) coined to represent units with SI prefixes.

Custom Entries & Existing Entries Overriding or Exclusion

With an accuracy rate of 99%, the possibility of needing an adjustment is rare. However, Cantonese, like other varieties of Chinese, is mostly written in logographs, which means that homographs (同形詞) that are indistinguishable out of context can occur. Consider the following sentence:

上堂終於講到分數

In the above sentence, there are multiple possible pronunciations of 上, 到 and 分, and their meanings are different depending on how they are actually pronounced:

| Pronunciation | Meaning | | --- | --- | | soeng5 tong4 zung1 jyu1 gong2 dou3 fan1 sou3 | Attending the lesson, it finally came to talk about scores.(Perhaps the scores weren’t available until today.) | | soeng5 tong4 zung1 jyu1 gong2 dou3 fan6 sou3 | Attending the lesson, it finally came to talk about fractions.(Perhaps the progress of the math class was slow.) | | soeng5 tong4 zung1 jyu1 gong2 dou2 fan1 sou3 | Attending the lesson, eventually it was able to talk about scores.(Perhaps the teacher wasn’t allowed to reveal the scores until today.) | | soeng5 tong4 zung1 jyu1 gong2 dou2 fan6 sou3 | Attending the lesson, eventually it was able to talk about fractions.(Perhaps the introduction to fractions requires some other concepts to be taught.) | | soeng6 tong4 zung1 jyu1 gong2 dou3 fan1 sou3 | The previous lesson finally came to talk about scores.(Perhaps the teacher just made the scores available right before the previous lesson.) | | soeng6 tong4 zung1 jyu1 gong2 dou3 fan6 sou3 | The previous lesson finally came to talk about fractions.(Perhaps the students just managed to catch up the progress in the math class.) | | soeng6 tong4 zung1 jyu1 gong2 dou2 fan1 sou3 | Eventually, it was able to talk about scores in the previous lesson.(Perhaps the teacher was finally allowed to reveal the scores in the previous lesson.) | | soeng6 tong4 zung1 jyu1 gong2 dou2 fan6 sou3 | Eventually, it was able to talk about fractions in the previous lesson.(Perhaps the teacher just finished teaching the other concepts required for learning fractions.) |

Thus, the library offers the ability to include custom entries and override or exclude built-in entries:

> ToJyutping.getJyutpingText("上堂終於講到分數");
"soeng5 tong4 zung1 jyu1 gong2 dou3 fan1 sou3"

> const converterLesson = ToJyutping.customize({ 上堂: null, 分數: "fan6 sou3" });
> converterLesson.getJyutpingText("上堂終於講到分數");
"soeng6 tong4 zung1 jyu1 gong2 dou3 fan6 sou3"

In the above example:

  • By default, the library special-cases the pronunciation of 上堂 to “soeng5 tong4”. Setting 上堂 to null removes the special case and both 上 and 堂 now fallback to the their default pronunciations, which are “soeng6” and “tong4” respectively.
  • By default, the library does not special-case 分數. Thus, the pronunciations of each individual characters, which in this case are “fan1” and “sou3”, are used. By including the entry 分數 and setting it to fan6 sou3, the converter outputs fan6 sou3 when 分數 is encountered.

In general, setting any built-in entry to null or undefined fallbacks it to shorter matches and ultimately individual character pronunciations if there isn’t a match:

> ToJyutping.getJyutpingText("好學生");
"hou2 hok6 saang1"

> const converterStudious = ToJyutping.customize({ 好學生: null });
> converterStudious.getJyutpingText("好學生");
"hou3 hok6 saang1" // Using shorter matches 好學 and 生

> const converterGoodStudent = converterStudious.customize({ 好學: null });
> converterGoodStudent.getJyutpingText("好學生");
"hou2 hok6 saang1" // Using individual character pronunciations as it can’t be decomposed further

Converters can be chained without affecting each other:

> const converterDou2 = converterLesson.customize({ 到: "dou2" });
> const converterNull = converterLesson.customize({ 到: null });

> converterDou2.getJyutpingText("上堂終於講到分數");
"soeng6 tong4 zung1 jyu1 gong2 dou2 fan6 sou3"

> converterNull.getJyutpingText("上堂終於講到分數");
"soeng6 tong4 zung1 jyu1 gong2 […] fan6 sou3"

> ToJyutping.getJyutpingText("上堂終於講到分數");
"soeng5 tong4 zung1 jyu1 gong2 dou3 fan1 sou3" // Also not affected

[!WARNING]

  • This library only offers basic customization functionality. If there are longer built-in word entries, they aren’t overridden:

    > converterDou2.getJyutpingText("笑到轆地");
    "siu3 dou3 luk1 dei2"
    
    > converterNull.getJyutpingText("笑到轆地");
    "siu3 dou3 luk1 dei2"
    
    > const converterAnotherLesson = ToJyutping.customize({ 上: null, 分: "fan6" });
    > converterAnotherLesson.getJyutpingText("上堂終於講到分數");
    "soeng5 tong4 zung1 jyu1 gong2 dou3 fan6 sou3"

    In the second example, their isn’t an entry for 分數, so 分 is patched successfully. However, this is not the case for 上 since the longer built-in entry 上堂 is prioritized.

  • The original pronunciations will be lost. If you are using getJyutpingCandidates or getIPACandidates, you will need to include the pronunciations manually:

    > const 到OriginalPronunciations = ToJyutping.getJyutpingCandidates("到");
    > 到OriginalPronunciations
    [["到", ["dou3", "dou2"]]]
    > const converterDou2Dou3 = converterLesson.customize({ 到: ["dou2", ...到OriginalPronunciations[0][1]] });
    > converterDou2Dou3.getJyutpingCandidates("到");
    [["到", ["dou2", "dou3"]]]

    Notice how the library automatically deduplicates the values for you.

Helper

> ToJyutping.jyutpingToIPA("jat1");
"jɐt̚˥"
> ToJyutping.jyutpingToIPA("cin1 ngaa5");
"t͡sʰiːn˥.ŋaː˩˧"

Note that autocorrection is intentionally not included in this helper, and an error is thrown if strings like jyt6 are passed into the function. Punctuation is ignored in the helper.