supernormalize
v1.0.2
Published
A package to normalize strings by compatibility, homoglyphs and accent mark removal.
Downloads
12
Maintainers
Readme
supernormalize
supernormalize
is a JavaScript library that agressively normalizes text to a standard form. Use cases include:
- Mitigating homoglyph attacks
- Normalizing text for comparison
- Preparation for indexing text in a search engine
- Preparation for blacklisting text
Steps
The library performs the following steps:
- Remove all marks (i.e. diacritics) and perform compatibility normalization
- Convert the text to lowercase
- Normalize homoglyphs using a mapping based on this list from the Unicode Consortium in version 15.1.0 (the used list does not include homoglyphs that are already normalized in steps 1 and 2)
- Replace all whitespace characters with a single space and trim the text
Installation
npm install supernormalize
Usage
import { supernormalize } from "supernormalize";
const text = "⋿╳⍺rñ⍴lé";
const normalizedText = supernormalize(text);
console.log(normalizedText); // 'examp1e'
Examples
| Input | Output | Note |
| ------------------- | ----------------- | --------------------------------------------------------------------- |
| ⋿╳⍺rñ⍴lé
| examp1e
| Below rules can be combined |
| 𝕋𝕙𝕚𝕤 𝕚𝕤 𝕒 𝕥𝕖𝕤𝕥!
| th1s 1s a test!
| Homoglyphs are normalized to a common form |
| D̴̝̼̅i̴̱̐͊́a̵̢͎͒͝ĉ̵͓̈́̽r̶͂͝ͅi̷͔͜͝ṭ̴͋͆͘i̵͔̅c̷̛͉̪͂͊s̵̞̝̲͊
| d1acr1t1cs
| Diacritics are removed |
| AАΑ
| aaa
| Latin, Cyrillic, and Greek characters are normalized to the same form |
| rn
| m
| Multiletter homoglyphs are normalized |
| ffi…
| ff1...
| Ligatures are normalized to letters |
| \tHELLO WORLD \n
| he110 w0r1d
| Whitespace and casing is normalized |
Functions
supernormalize(text: string): string
Normalizes the given text performing the steps described above.
supernormalize.normalizeCase(text: string): string
Converts the given text to lowercase.
supernormalize.normalizeMarks(text: string): string
Removes all marks (i.e. diacritics) and performs compatibility normalization on the given text.
supernormalize.normalizeHomoglyphs(text: string): string
Normalizes homoglyphs using a mapping based on this list from the Unicode Consortium.
supernormalize.normalizeWhitespace(text: string): string
Replaces all whitespace characters with a single space. Trim the text.
License
This project is licensed under the MIT License - see the LICENSE file for details.