npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

fix-broken-utf8-encodings

v0.0.1

Published

Fix UTF-8 encodings which have been broken into single byte sequences

Downloads

3

Readme

fix-broken-utf8-encodings

Sometimes, encoding issues happen and you end up with sequences of improperly escaped unicode code units in your JSON. For example, the string

"Don\\u00e2\\u0080\\u0099t forget to drink your Ovaltine."

which will decode as

"Don’t forget to drink your Ovaltine."

This occurs because the original string was encoded as UTF-8, but the JSON was encoded incorrectly (usually Latin-1 or ASCII).

The string should be encoded in JSON as

"Don\\u2019t forget to drink your Ovaltine."

or simply

"Don’t forget to drink your Ovaltine."

(note the fancy apostrophe).

This small library provides a function which will make a best-effort attempt to reinterpret these broken sequences into their original unicode control points.

Usage

const {reinterpret} = require("fix-broken-utf8-encodings");

// simply call reinterpret on your broken string
reinterpret("Don\\u00e2\\u0080\\u0099t forget to drink your Ovaltine.");
// returns "Don\\u2019t forget to drink your Ovaltine."

Additionally, there is a convenience method unescape which will both reinterpret and convert the escaped unicode code points into their actual unicode characters.

const {unescape} = require("fix-broken-utf8-encodings");
unescape("Don\\u00e2\\u0080\\u0099t forget to drink your Ovaltine.");
// returns "Don’t forget to drink your Ovaltine."

but usually you'll want to use reinterpret and then JSON.parse the result, if you are interpreting a JSON string.

Caveats

Since it's difficult to know when a series of code units is intentional or a manifestation of this encoding issue, this library only attempts to fix sequences of at least three one-byte code units. This means that it will fix sequences like

"\\u00e2\\u0080\\u0099"; // fancy apostrophe
"\\u00f0\\u009f\\u008e\\u00bb"; // violin emoji

but not sequences like

"\\u0020"; // space
"\\uD834\\uDF06"; // violin utf-16 surrogate pair
"\\uD834\\uDF06\\uD834\\uDF06"; // two violin utf-16 surrogate pairs (won't be modified since these are not one-byte code units)

Additionally, this library does not make an effort to find and fix unicode sequences which are not escaped, since it's difficult to know in general when sequences of characters are intentional.

License

MIT