uwutf8
v3.0.2
Published
A well-tested UTF-8 encoder/decoder written in JavaScript.
Downloads
3
Maintainers
Readme
uwutf8
uwutf8 is an uwu fork of utf8.js, a well-tested UTF-8 encoder/decoder written in JavaScript. It uwu encode/decode any scalar Unicode code point values, as per the Encoding Standard.
Feel free to fork if you see possible improvements!
Installation
Via npm:
npm install uwutf8
In a browser:
<script src="uwutf8.js"></script>
In Node.js:
const uwutf8 = require('uwutf8');
API
Encoding
Forms available:
uwutf8.encode(string, opts)
uwutf8.encodeAsArray(string, opts)
uwutf8.encodeAsUint8Array(string, opts)
Encodes any given JavaScript string (string
) as UTF-8 (the opts
object being optional), and returns the UTF-8-encoded version of the string. Depending on whether strict
is set, it either throws an error if the input string contains a non-scalar value, i.e. a lone surrogate, or replaces that value with the character U+FFFD. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)
Available options:
strict
: whether encountering a lone surrogate should throw an error (defaults totrue
). Else, each lone surrogate is replaced by the character U+FFFD.
// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
uwutf8.encode('\xA9');
// → '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001
uwutf8.encode('\uD800\uDC01');
// → '\xF0\x90\x80\x81'
uwutf8.encode('\uDC00');
// → throws 'Lone surrogate is not a scalar value' error
uwutf8.encode('\uDC00', { strict: false });
// → '\xEF\xBF\xBD'
Decoding
Forms available:
uwutf8.decode(byteString, opts)
uwutf8.decodeArray(array, opts)
Decodes any given bytes sequence (byteString
being a String containing the bytes, and array
being an Array or Uint8Array of the bytes) as UTF-8 (the opts
object being optional), and returns the UTF-8-decoded version of the string. If strict
mode is set, it throws an error when malformed UTF-8 is detected.
Available options:
strict
: whether encountering a non-scalar value should throw an error (defaults totrue
). Else, each non-scalar value is decoded as U+FFFD.
uwutf8.decode('\xC2\xA9');
// → '\xA9'
uwutf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01'
// → U+10001 LINEAR B SYLLABLE B038 E
uwutf8.decode('\xED\xB0\x80');
// → throws 'Lone surrogate is not a scalar value' error
uwutf8.decode('\xED\xB0\x80', { strict: false });
// → '\uFFFD'
uwutf8.version
A string representing the semantic version number.
Support
uwutf8 has tests. uwu
Unit tests & code coverage
After cloning this repository, run npm install
to install the dependencies needed for development and testing. You may want to install Istanbul globally using npm install istanbul -g
.
Once that’s done, you can run the unit tests in Node using npm test
or node tests/tests.js
.
To run the tests in Rhino, Ringo, Narwhal, PhantomJS, and web browsers as well, use grunt test
. At least I think this works but I haven't tried it.
To generate the code coverage report, use grunt cover
. Also I have no idea how to do this.
FAQ
Why is the first release named v3.0.0? Haven’t you heard of semantic versioning?
:3
OG
| | |---| | Mathias Bynens |
License
uwutf8 is available under the MIT license. I actually dunno what I'm supposed to do here now that I've forked it.