qb-utf8-ez
v3.1.3
Published
Easy-to-use functions for encoding/decoding UTF-8 in the browser
Downloads
294
Maintainers
Readme
qb-utf8-ez
Easy-to-use UTF-8 encoding and decoding that work in all browsers (except ancient < IE 5.5). Based on tiny implementations (qb-utf8-to-str-tiny and qb-utf8-from-str-tiny), which are tiny and good for small decoding jobs, but not fast for very large files.
Complies with the 100% test coverage and minimum dependency requirements of qb-standard .
Install
install qb-utf8-ez
API Update 2.x -> 3.x
Functions that take array-like parameters and ranges have been updated to work with terms defined in the glossary. Namely, functions of the form
- function ( buf, beg, end )
- function ( buf, {beg:0, end:10} )
have been updated to
Usage
var utf8 = require('qb-utf8-ez');
var buf = utf8.buffer('hello. 你好');
console.log(buf);
Prints a buffer with UTF-8 code points:
> [ 104, 101, 108, 108, 111, 46, 32, 228, 189, 160, 229, 165, 189 ]
and
var s = utf8.string(buf);
console.log(s);
Prints:
> hello. 你好
buffer(value, options)
same as utf8 (old function name)
utf8(value, options)
Return an array or buffer of UTF-8 encoded bytes for the given value 'v'
v may be:
options:
ret_type: (string) 'array', 'buffer', or 'uint8array' - the type to create and return.
fill_to_length: (integer) if set, an array of the given length will be returned,
filled with encoded values copied from v.
Invalid truncated encodings are replaced with the
fill_byte.
fill_byte: (integer or string) ascii code or single character string used if needed to
fill buffer at the end to prevent truncated utf8.
For convenience, value may be:
an unicode code point, such as 0x10400 '𐐀'
an array of code points
a string
... in any case, buffer(value) will return an array.
string(src, options)
Convert an array-like object (buffer) to a javascript string.
options
off: index to start at
lim: index to stop before
escape: string expression, single ascii character or integer. (default is '?').
If ascii integer or string, illegal bytes will be replaced 1-for-1 by this value. If expression of the form "!{%H}", then strings of illegal bytes will be prefixed with the value before %H, such as '!{', and suffixed with value after %H, e.g. '}' and bytes will be written as ascii hex between these values.
string() makes use of qb-utf8-illegal-bytes to automatically detect and escape illegal UTF-8 encodings. The default decoding behavior is to replace illegal values with '?':
var utf8 = require('qb-utf8-ez');
utf8.string([... some buffer with four illegal characters *** here *** then ok again.. ]);
> ... some buffer with illegal characters ??? then ok again..
Another option is to use the encode option to substitute bad bytes in situ, keeping all other buffer contents in place.
utf8.string([...], { encode: '!{%H}' });
> ... some buffer with illegal characters !{F09082} then ok again..
compare( src1, off1, lim1,src2, off2, lim2 )
Compare code points of two byte ranges holding UTF8-encoded data. The function works similarly to the sort comparator in javascript.
return
- 1 if src1 selection is greater
- -1 if src2 selection is greater
- 0 if selections are equal
(compare is also available as a separate package with zero dependencies)
fill(dst, sample, options)
Fill up a buffer with a smaller buffer sample which may be a string or array-like object.
options
off: index to start at
lim: index to stop before ( < lim )
escape: handling for illegal bytes (same as string(), above) (default is '?')
join(buffers, joinbuf)
Like string.join(), but joins together arrays/buffers of bytes. Joins together buffers into one buffer with joinbuf as a separator between each. buffers can be an array of array-like objects with byte/integer values. joinbuf can value accepted by the buffer() function such as string or array of code points.
escape_illegal(src, opt)
Return a buffer with illegal characters replaced. If a single character or number escape is given, the buffer will be changed in place and returned. If an escape expression is given, a new (longer) buffer will be returned copied from the old with the escaped areas. Options escape, off, and lim work as they do with string(), above.
illegal_bytes(src, off, lim)
Return ranges of illegal UTF-8 encoding. See qb-utf8-illegal-bytes