uwutf8

v3.0.2

Published

3 years ago

A well-tested UTF-8 encoder/decoder written in JavaScript.

Downloads

0High
0Medium
0Low

alanlu1023

charset encoding unicode utf8

uwutf8

uwutf8 is an uwu fork of utf8.js, a well-tested UTF-8 encoder/decoder written in JavaScript. It uwu encode/decode any scalar Unicode code point values, as per the Encoding Standard.

Feel free to fork if you see possible improvements!

Installation

Via npm:

npm install uwutf8

In a browser:

<script src="uwutf8.js"></script>

In Node.js:

const uwutf8 = require('uwutf8');

API

Encoding

Forms available:

uwutf8.encode(string, opts)
uwutf8.encodeAsArray(string, opts)
uwutf8.encodeAsUint8Array(string, opts)

Encodes any given JavaScript string (string) as UTF-8 (the opts object being optional), and returns the UTF-8-encoded version of the string. Depending on whether strict is set, it either throws an error if the input string contains a non-scalar value, i.e. a lone surrogate, or replaces that value with the character U+FFFD. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)

Available options:

strict: whether encountering a lone surrogate should throw an error (defaults to true). Else, each lone surrogate is replaced by the character U+FFFD.

// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
uwutf8.encode('\xA9');
// → '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001

uwutf8.encode('\uD800\uDC01');
// → '\xF0\x90\x80\x81'

uwutf8.encode('\uDC00');
// → throws 'Lone surrogate is not a scalar value' error

uwutf8.encode('\uDC00', { strict: false });
// → '\xEF\xBF\xBD'

Decoding

Forms available:

uwutf8.decode(byteString, opts)
uwutf8.decodeArray(array, opts)

Decodes any given bytes sequence (byteString being a String containing the bytes, and array being an Array or Uint8Array of the bytes) as UTF-8 (the opts object being optional), and returns the UTF-8-decoded version of the string. If strict mode is set, it throws an error when malformed UTF-8 is detected.

Available options:

strict: whether encountering a non-scalar value should throw an error (defaults to true). Else, each non-scalar value is decoded as U+FFFD.

uwutf8.decode('\xC2\xA9');
// → '\xA9'

uwutf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01'
// → U+10001 LINEAR B SYLLABLE B038 E

uwutf8.decode('\xED\xB0\x80');
// → throws 'Lone surrogate is not a scalar value' error

uwutf8.decode('\xED\xB0\x80', { strict: false });
// → '\uFFFD'

`uwutf8.version`

A string representing the semantic version number.

Support

uwutf8 has tests. uwu

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for development and testing. You may want to install Istanbul globally using npm install istanbul -g.

Once that’s done, you can run the unit tests in Node using npm test or node tests/tests.js.

To run the tests in Rhino, Ringo, Narwhal, PhantomJS, and web browsers as well, use grunt test. At least I think this works but I haven't tried it.

To generate the code coverage report, use grunt cover. Also I have no idea how to do this.

FAQ

Why is the first release named v3.0.0? Haven’t you heard of semantic versioning?

OG

| | |---| | Mathias Bynens |

License

uwutf8 is available under the MIT license. I actually dunno what I'm supposed to do here now that I've forked it.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

uwutf8

Installation

API

Encoding

Decoding

uwutf8.version

Support

Unit tests & code coverage

FAQ

Why is the first release named v3.0.0? Haven’t you heard of semantic versioning?

OG

License

`uwutf8.version`