npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@cto.af/string-width

v3.2.0

Published

Get width of a Unicode string in fixed-width display cells, accounting for combining characters, emojis, flags, Hangul, East Asian Width, default ignorables, and a few more.

Downloads

51

Readme

@cto.af/string-width

Get width of a Unicode string in fixed-width display cells, accounting for combining characters, emoji, flags, Hangul, East Asian Width, default ignorable characters, and a few more edge cases.

Installation

npm install @cto.af/string-width

API

Full documentation is available.

import {StringWidth, AMBIGUOUS, POTENTIAL_EMOJI} from '../lib/index.js'

const sw = new StringWidth()
sw.width('foo') // 3
sw.width('\u{1F4A9}') // 2: Emoji take two cells
sw.width('#\ufe0f\u20e3') // 2: More complicated emoji
sw.break('foobar', 3) // [
  //   {string: 'foo', cells: 3},
  //   {string: 'bar', cells: 3, last: true}
  // ]

const custom = new StringWidth({
  locale: 'ko-KR',
  isCJK: true,
  extraWidths: new Map([
    // This example is not actually useful, but demonstrates how to customize

    // 'K' how has ambiguous East Asian Width
    [0x4b, AMBIGUOUS],
    // 'O' might now start an Emoji sequence
    [0x4f, POTENTIAL_EMOJI],
    // 'R' now has a width of 3 cells
    [0x52, 3]
  ])
})

Approach

  • All of the "interesting" characters are put into a Trie in widths.js at build time. POTENTIAL_EMOJI(14) is a sentinel for "possible emoji". AMBIGUOUS(15) is a sentinel for "ambiguous East Asian Width", and all the rest of the values are the width for that code point. The default result from the Trie is 1.
  • If a string is all ASCII, just count characters. This happens a lot, so it's worth a performance shortcut
  • For each grapheme cluster:
    • Get the width of the first code point from extraWidths or the Trie.
    • If the width is AMBIGUOUS, return 2 if we're in a CJK context, otherwise 1.
    • If the width is POTENTIAL_EMOJI, check if the whole grapheme cluster is an emoji
  • Since backspace has a negative width, ensure that the total width is never less than zero.
  • ANSI escape sequences are ignored for width, unless the includeANSI option is enabled.

Chinese, Japanese, or Korean (CJK) contexts

Some code points have ambiguous length, which depends upon whether we are counting in a CJK context or not. By default, StringWidth will look at the locale that is given (or derived from the environment), and use the default script of that locale to decide if this is a Chinese, Japanese, or Korean context. The script identifiers 'Hans', 'Hant', 'Jpan', and 'Kore' signal CJK context. If desired, this detection can be overridden by passing in the isCJK field in the constructor options.

Width breaking

The break(string, N) method slices a string into chunks, each of which is at most N cells. This was so entangled with the width logic that it made sense to be in this library. It is useful for strings that are longer than N that need to have a hyphen inserted between each of the segments, ensuring that the hyphen doesn't go in the middle of a grapheme cluster.

Known Limitations

  • Font ligatures are not taken into account.
  • Variable width fonts are not considered. Calculated widths are in display cells, not pixels.

Development

On a new Unicode version being released, delete the tools/*.txt files, then do npm run build to re-generate the Trie.


Tests codecov