npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

hardcoded-language-detector

v1.0.6

Published

NPM library for detecting the language of given text

Downloads

460

Readme

hardcoded-language-detector

A powerful script family detection library that analyzes text using Unicode ranges to determine the ratio of different writing systems present in the text.

Features

  • Fast and accurate script family detection using Unicode ranges
  • Returns ratios of different script families in the text
  • Identifies the dominant script family
  • Supports mixed script detection
  • Handles special characters and numbers
  • No external dependencies
  • Lightweight (~20KB)

Installation

npm install hardcoded-language-detector

Supported Script Families

Each script family is represented by a two-letter code:

  • Latin (la) - Basic Latin, Extended Latin-A to E

    • English, French, German, Spanish, Portuguese, Vietnamese, Turkish, etc.
    • Includes diacritics and special characters used in European languages
  • Cyrillic (cy) - Cyrillic and Extensions

    • Russian, Ukrainian, Bulgarian, Serbian, Belarusian, etc.
  • Arabic (ar) - Arabic and Extensions

    • Arabic, Persian (Farsi), Urdu, Kurdish, Sindhi
    • Includes all Arabic presentation forms and supplements
  • Devanagari (de) - Devanagari and Extensions

    • Hindi, Marathi, Sanskrit, Nepali, etc.
  • Brahmic (br) - Various Brahmic family scripts

    • Bengali, Tamil, Telugu, Kannada, Malayalam
    • Gujarati, Gurmukhi (Punjabi), Oriya, Sinhala
  • Han (hz) - CJK Unified Ideographs

    • Chinese (Traditional & Simplified)
    • Japanese Kanji
    • Korean Hanja
    • Includes all CJK extensions (A through H)
  • Kana (kn) - Japanese syllabaries

    • Hiragana
    • Katakana (including half-width forms)
    • Phonetic extensions
  • Hangul (hn) - Korean writing system

    • Modern Hangul syllables
    • Archaic Korean letters
    • Compatibility Jamo
    • Half-width forms
  • Thai (th) - Thai script

    • Thai language characters
    • Thai digits and symbols
  • Hebrew (he) - Hebrew script

    • Modern Hebrew
    • Biblical Hebrew
    • Includes presentation forms
  • Greek (gr) - Greek and Coptic

    • Modern Greek
    • Ancient Greek
    • Extended Greek
    • Ancient Greek numbers
  • Unknown (un) - Unrecognized scripts or special characters

    • Numbers
    • Punctuation marks
    • Special symbols
    • Emojis
    • Other Unicode characters not in above categories

Usage

const detectScriptFamily = require('hardcoded-language-detector');

// Single script
console.log(detectScriptFamily('Hello World'));
// Output: { top: 'la', la: 1 }

console.log(detectScriptFamily('안녕하세요'));
// Output: { top: 'hn', hn: 1 }

// Mixed scripts
console.log(detectScriptFamily('Hello 안녕 こんにちは'));
// Output: { top: 'la', la: 0.33, hn: 0.33, kn: 0.34 }

// Special cases
console.log(detectScriptFamily('123!@#'));
// Output: { top: 'un', un: 1 }

console.log(detectScriptFamily(''));
// Output: { top: 'un', un: 1 }

Return Value Format

The function returns an object with:

  • top: The dominant script family code (highest ratio)
  • Script family codes as keys with their ratios as values
  • Ratios are rounded to 2 decimal places
  • Only ratios >= 0.01 (1%) are included
  • Unknown or special characters are marked as 'un'

Error Handling

  • Empty strings return { top: 'un', un: 1 }
  • Strings with only numbers/special characters return { top: 'un', un: 1 }
  • Invalid input (null/undefined) returns { top: 'un', un: 1 }

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Note on Code Blocks

When using this library in documentation, please be careful with code blocks containing CJK characters. Some markdown processors might have issues with Unicode characters in code blocks. Always test the documentation rendering with CJK examples.