string-to-unicode-variant
v1.0.9
Published
Javascript function to convert a string into different kind of ⓤⓝⓘⓒⓞⓓⓔ variants.
Downloads
1,501
Maintainers
Readme
𝗎҉ toUnicodeVariant
Javascript function to convert a string into different kind of ⓤⓝⓘⓒⓞⓓⓔ variants.
toUnicodeVariant
is an attempt to utilize unicode in a structured, organized and logical manner.
browser
<script src="path/to/toUnicodeVariant.js"></script>
nodejs
const toUnicodeVariant = require('path/to/toUnicodeVariant.js')
typescript
npm install string-to-unicode-variant
import {string_to_unicode_variant} from "string-to-unicode-variant";
Usage
Pass a string and the name of a variant (or alias), and you get the unicoded' string in return :
toUnicodeVariant(string, variant, combinings)
...
toUnicodeVariant('monospace', 'm') //like first row below
|Variant | Alias | Description | Example | |:--------- |:-----:|:----------------------------- |:----------------- | | monospace | m | Monospace | 𝚖𝚘𝚗𝚘𝚜𝚙𝚊𝚌𝚎 | | bold | b | Bold text |𝐛𝐨𝐥𝐝 | | italic | i | Italic text | 𝑖𝑡𝑎𝑙𝑖𝑐 | | bold italic | bi | bold+italic text | 𝒃𝒐𝒍𝒅 𝒊𝒕𝒂𝒍𝒊𝒄 | | script | c | Handwriting style | 𝓈𝒸𝓇𝒾𝓅𝓉 | | bold script | bc | Bolder handwriting | 𝓫𝓸𝓵𝓭 𝓼𝓬𝓻𝓲𝓹𝓽 | | gothic | g |Gothic (fraktur) | 𝔤𝔬𝔱𝔥𝔦𝔠 | | gothic bold | bg | Gothic in bold| 𝖌𝖔𝖙𝖍𝖎𝖈 𝖇𝖔𝖑𝖉 | | doublestruck | d | Outlined text | 𝕕𝕠𝕦𝕓𝕝𝕖𝕤𝕥𝕣𝕦𝕔𝕜 | | 𝗌𝖺𝗇𝗌 | s | Sans-serif style | 𝗌𝖺𝗇𝗌 | | bold 𝗌𝖺𝗇𝗌 | bs | Bold sans-serif | 𝗯𝗼𝗹𝗱 𝘀𝗮𝗻𝘀 | | italic 𝗌𝖺𝗇𝗌 | is | Italic sans-serif | 𝘪𝘵𝘢𝘭𝘪𝘤 𝘴𝘢𝘯𝘴 | | bold italic sans | bis | Bold italic sans-serif | 𝙗𝙤𝙡𝙙 𝙞𝙩𝙖𝙡𝙞𝙘 𝙨𝙖𝙣𝙨 | | circled | o | Letters within circles | ⓒⓘⓡⓒⓛⓔⓓ | | circled negative | on | -- negative | 🅒🅘🅡🅒🅛🅔🅓 | | squared | q | Letters within squares | 🅂🅀🅄🄰🅁🄴🄳 | | squared negative | qn | -- negative | 🆂🆀🆄🅰🆁🅴🅳 | paranthesis | p | Letters within paranthesis | ⒫⒜⒭⒠⒩⒯⒣⒠⒮⒤⒮ | | fullwidth | w | Wider monospace font | fullwidth | | flags | f | Regional codes | 🇩🇰 🇺 🇳 🇮 🇨 🇴 🇩 🇪 | | numbers dot | nd | Numbers with trailing dot | ⒈⒉⒊⒋ | numbers comma | nc | Numbers with trailing comma | 🄂🄃🄄🄅| | number double circled | ndc | Numbers within double circle | ⓵⓶⓷⓸ | | roman | r | Roman numerals | Ⅰ, Ⅱ, ⅯⅯⅩⅩⅢ |
Combining with underline, strike and other diacritical marks
The unicoded' text can be combined with a broad range of diacritical marks
toUnicodeVariant('underlined', 'bold italic', 'underline-double')//𝒖̳𝒏̳𝒅̳𝒆̳𝒓̳𝒍̳𝒊̳𝒏̳𝒆̳𝒅̳
You can control the space between each character by using space-combinings. In the above table, rendering of the halo- and enclose- samples are used along with a space-en to make them look nicer.
Combinings can be combined
You can use two, three or more combinings either by passing a comma separated string, or by passing an array of strings :
toUnicodeVariant('The quick brown fox jumps ...', 'sans', 'underline, overline, strike')
toUnicodeVariant('The quick brown fox jumps ...', 'sans', ['underline', 'overline', 'strike'])
𝖳̶̲̅𝗁̶̲̅𝖾̶̲̅ ̶̲̅𝗊̶̲̅𝗎̶̲̅𝗂̶̲̅𝖼̶̲̅𝗄̶̲̅ ̶̲̅𝖻̶̲̅𝗋̶̲̅𝗈̶̲̅𝗐̶̲̅𝗇̶̲̅ ̶̲̅𝖿̶̲̅𝗈̶̲̅𝗑̶̲̅ ̶̲̅𝗃̶̲̅𝗎̶̲̅𝗆̶̲̅𝗉̶̲̅𝗌̶̲̅ ̶̲̅𝗈̶̲̅𝗏̶̲̅𝖾̶̲̅𝗋̶̲̅ ̶̲̅𝗍̶̲̅𝗁̶̲̅𝖾̶̲̅ ̶̲̅𝗅̶̲̅𝖺̶̲̅𝗓̶̲̅𝗒̶̲̅ ̶̲̅𝖽̶̲̅𝗈̶̲̅𝗀̶̲̅
You can use shorthand aliases or a mix, 'u,o,s'
, ['u','o','strike']
etc.
Special chars
Language specific special chars like ç
, ò
or ø
are not supported by any unicode "variant", and will almost certainly never be in any future. The script and gothic fonts are in fact just various kind of mathematical symbols (see references below). For many of the variants, converting a special char like ø
will at best look odd, probably ruin the entire string (vary on reader / browser).
But -- by using the base latin character as fallback, and inject a makeover of diacritical marks, we can experimentally try to mimick some language specific characters. Adding diacritics fails with the figurative variants, but it works okay with most of the rest.
toUnicodeVariant('üničode', 'bold italic') //𝒖̈𝒏𝒊𝒄̌𝒐𝒅𝒆
toUnicodeVariant('ÜNIĈODE', 'bold italic') //𝑼𝑵𝑰𝑪𝑶𝑫𝑬
Additions, limitations
Besides the limitations you can see in the various compatibility tables above, some variants offers extra unique features - other variants are reduced to one single feature alone.
Ⅻ roman, continued
If you pass a number (integer) instead of a string, that number will be romanized automatically before converting to unicode
toUnicodeVariant(2023, 'roman') //ⅯⅯⅩⅩⅢ
flags, f
az-AZ only. Based on the highly special regional indicator symbols (see references below, U1F100.pdf). Using that you'll need to pass a string with whitespace between each character (otherwise expect weird output, there is no fallback to monospace) :
toUnicodeVariant('U N I C O D E', 'f') //🇺 🇳 🇮 🇨 🇴 🇩 🇪
However, if you pass a string that contain a country code, or even the name of some international organization, many readers will render the corresponding flag instead :
toUnicodeVariant('DK EU UN', 'flags') //🇩🇰 🇪🇺 🇺🇳
Reset a unicoded' string
Use String.normalize()
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
'𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟'.normalize('NFKC') //or NFKD
returns abcdefghijklmnopqrstuvwxyz
Test
Browser: test/browser.html
Node: test$ node node.js
These tests show all variants and their coverage az-AZ-09, along with flag combinations For reference, in Chrome (Ubuntu 20.04, 112.x) variants looks like this :
-- Or you can review a sample output, test/result-sample.html.txt. Try it out in different browsers - there are significant difference in coverage.
References
https://www.unicode.org/charts/PDF/UFF00.pdf https://www.unicode.org/charts/PDF/U1F100.pdf https://www.unicode.org/charts/PDF/U1D400.pdf https://www.unicode.org/charts/PDF/U2150.pdf https://www.unicode.org/charts/PDF/U2460.pdf https://www.unicode.org/charts//PDF/Unicode-3.2/U32-2000.pdf https://www.unicode.org/charts//PDF/Unicode-4.0/U40-0300.pdf
Playground
For now, visit https://detfrieord.dk/tekst-til-unicode (in danish, sorry)