npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@fgv/ts-bcp47

v4.0.2

Published

BCP-47 Tag Utilities

Downloads

59

Readme

Summary

Typescript utilities for parsing, manipulating and comparing BCP-47 language tags.

Installation

with npm:

npm install @fgv/ts-bcp47

API Documentation

Extracted API documentation is here

Overview

Classes and functions to:

TL; DR

For those who already understand BCP-47 language tags and just want to get started, here are a few examples:

import { Bcp47 } from '@fgv/ts-bcp47';

// parse a tag to extract primary language and region
const {primaryLanguage, region} = Bcp47.tag('en-us').orThrow().subtags;
// primaryLanguage is 'en', region is 'us'

// parse a tag to extract primary language and region in canonical form
const {primaryLanguage, region} = Bcp47.tag('en-us', { normalization: 'canonical' }).orThrow().subtags;
// primary language is 'en', region is 'US'

// normalize a tag to fully-preferred form
const preferred = Bcp47.tag('art-lojban', { normalization: 'preferred' }).orThrow().tag;
// preferred is "jbo"

// tags match regardless of case
const match = Bcp47.similarity('es-MX', 'es-mx').orThrow(); // 1.0 (exact)

// suppressed script matches explicit script
const match = Bcp47.similarity('es-MX', 'es-latn-mx').orThrow(); // 1.0 (exact)

// macro-region matches contained region well
const match = Bcp47.similarity('es-419', 'es-MX').orThrow(); // 0.7 (macroRegion)
const match = Bcp47.similarity('es-419', 'es-ES').orThrow(); // 0.3 (sibling)

// region matches neutral fairly well
const match = Bcp47.similarity('es', 'es-MX').orThrow(); // 0.6 (neutral)

// unlike tags do not match
const match = Bcp47.similarity('en', 'es').orThrow(); // 0.0 (none)

// different scripts do not match
const match = Bcp47.similarity('zh-Hans', 'zh-Hant').orThrow(); // 0.0 (none)

Note: This library uses the Result pattern, so the return value from any method that might fail is a Result object that must be tested for success or failure. These examples use either orThrow or orDefault to convert an error result to either an exception or undefined.

Anatomy of a BCP-47 language tag.

As specified in RFC 5646, a language tag consists of a series of subtags (mostly optional), each of which describes some aspect of the language being referenced.

Subtags

The full set of subtags that make up a language tag are:

Grandfathered Tags

The RFC allows for a handful of grandfathered tags which do not meet the current specification. Those tags are recognized in their entirety and are not composed of subtags, so for grandfathered tags only, even primary language is undefined.

Validation

Tag validation considers the tag in its current form and never changes the tag itself.

The specification defines two levels of conformance for language, and this library defines a third.

Well-Formed Tags

A well-formed tag meets the basic syntactic requirements of the specification, but might not be valid in terms of content.

Valid Tags

A valid tag meets both the syntactic and semantic requirements of the specification, meaning that either all subtags or full tag (in the case of grandfathered tags) are registered in the IANA language subtag registry, and neither extension nor variant tags are repeated.

Strictly Valid Tags

A strictly valid tags is valid according to the specification and also meets the rules for variant and extlang prefixes defined by the specification and recorded in the language registry.

Examples

Some examples:

  • eng-US is well-formed because it meets the language tag syntax but is not valid because eng is not a registered language subtag.
  • en-US is both well-formed and valid, because en is a registered language subtag.
  • es-valencia-valencia is well-formed but not valid, because the valencia extension subtag is repeated.
  • es-valencia is well-formed and valid, but it is not strictly-valid because language subtag registry defines a ca prefix for the valencia subtag.
  • ca-valencia is well-formed, valid, and strictly valid.

Normalization

Normalization transforms a tag to produce a new tag which is semantically identical, but preferred for some reason.

Not-normalized

A non-normalized must be well-formed and might be valid or strictly-valid but it does not use the letter case conventions recommended in the spec.

Canonical Form

A tag in canonical form meets all of the letter case conventions recommended by the specification, in addition to being at least well-formed.

Preferred Form

In addition to being strictly-valid and canonical, tags in preferred form do not have any deprecated, redundant or suppressed subtags.

Examples

  • zh-cmn-hans is strictly valid, but not canonical or preferred.
  • zh-cmn-Hans is strictly valid and canonical, but not preferred, because the subtag registry lists zh-cmn-Hans as redundant, with the preferred value cmn-Hans.
  • cmn-Hans is strictly valid, canonical and preferred.
  • en-latn-us is strictly valid, but not canonical or preferred.
  • en-Latn-US is strictly valid and canonical, but not preferred, because the subtag registry lists Latn as the suppressed script for the en language.
  • en-US is strictly valid, canonical and preferred.

Tag Matching

The match function matches language tags, using semantic similarity, unlike RFC 4647, which relies on purely syntactic rules. This semantic match yields much better results in many cases.

For any given language tag pair, the match function returns a similarity score in the range 0.0 (no similarity) to 1.0 (exact match).

The degrees of similarity are (from most to least similar):

  • exact (1.0) - The two language tags are semantically identical.
  • variant (0.9) - The tags vary only in extension or private subtags.
  • region (0.8) - The tags match on language, script and region but vary in variant, extension or private-use subtags.
  • macroRegion (0.7) - The tags match on language and script, and one of the region subtags is a macro-region (e.g. 419 for Latin America) which encompasses the second region tag.
  • neutralRegion (0.6) - The tags match on language and script, and only one of the tags contains a region subtag.
  • affinity (0.5) - The tags match on language and script, and two region subtags have an orthographic affinity. Orthographic affinity is defined in this package in the overrides.json file.
  • preferredRegion (0.4) - The tags match on language and script, and one of the tags is the preferred region subtag for the language. Preferred region is also defined in this package in overrides.json.
  • sibling (0.3) - The tags match on language and script but both have region tags that are otherwise unrelated.
  • undetermined (0.2) - One of the languages is the special language und.
  • none (0.0) - The tags do not match at all.

See Also

RFC 5646 - Tags for Identifying Languages IANA Language Subtag Registry