object-serializer
v0.0.2
Published
Serialize JavaScript values in binary format (including multi reference to one object)
Downloads
6
Readme
WARNING: This is so far hardly tested, and is also subject to change the
API and/or file format which is not currently stable. Should not be in
use except for experimental purposes and for testing.
This module exports a constructor function which should be called by using
the new operator. It takes two arguments, the internal version (default 0)
and the user version (default -1).
The internal version is the number of standard types to implement, and is
a unsigned 16-bit number. It will be written to the output, and when read
back it will be an error if it does not match. The standard types
implemented in this way must be taken from a prefix of the following list:
Array, ArrayBuffer, Date. Additional standard types include: Map, RegExp,
WeakMap, WeakSet.
The user version is a signed 16-bit number and can be used for whatever
purpose you want to use it for. It will be written to the output and when
read back it will be an error if it does not match.
There is no asynchronous operation; all operations are synchronous. It is
designed that it might be used to save the state of a roguelike game or
other standalone game, or possibly for a MUD state; not for realtime
applications. (If it were asynchronous, then the values could be changed
while it is working and that would make a mess of the serialization.)
=== Instance properties ===
These are properties that serializer instances have and that can be used
because the prototype for serializer instances have some of them too.
Note: External values/types have to be defined in the same order for
serializing as for unserializing, otherwise it won't work.
.defineObject(obj)
Defines the given object or symbol as an external value which the
serialized data can reference.
.defineStandardType(obj)
If obj is a key in the standard type list, then defines obj.prototype as
an external type with the standard implementation function.
.defineType(obj,fn)
Given obj which is a prototype object and fn which is the implementation
function, defines an external type, using the given function for
serializing and unserializing objects with the given prototype.
.internalVersion
The internal version number.
.serialize(stream,value)
Serialize the given value using the given streaming function.
.unserialize(stream)
Unserialize using the given streaming function, and then it returns the
value that has been read.
.userVersion
The user version number.
=== External types ===
An implementation of an external type is a function that takes three
arguments. The first argument is the context (either a reading context or
a writing context) to use. The second argument is the object to write; in
the case of reading, it is an empty object with the correct prototype,
which you might or might not use. The third argument is the set function.
The set function does nothing during writing. During reading, it replaces
the empty object it created with the object specified as its argument;
your function also needs to return that same object. If the object to be
read is not the same object this function was given, then it is necessary
to call the set function before returning or calling the .key or .value or
.properties methods of the context object.
This function returns the object read/written.
=== Stream functions ===
When specifying the stream function for serialize/unserialize, you can
also specify a number or a non-callable object.
If a number is given, it is assumed to be a file descriptor number.
Serializing treats a non-callable object as a Node.js writable stream, and
it will cause it to call the write method of that object.
Unserializing treats a non-callable object as a Buffer, ArrayBuffer, or
typed array. It reads from that buffer starting at the beginning.
The stream function takes one argument which is a Buffer instance. If
serializing it should write out the contents of that buffer, and if
unserializing it should read data into that buffer (using its full size).
The return value of a stream function is irrelevant and is not used. (If
you explicitly provide your own stream function, it is possible to use its
return value for something in an external type implementation; I do not
see how that can be useful, but maybe you have a use for it.)
=== Static properties ===
The properties directly of the object exported by this module are:
.ReadingContext(owner,stream)
A function that is the constructor for a reading context, where owner is
a serializer instance and stream is a reading stream function.
.WritingContext(owner,stream,root)
A function that is the constructor for a writing context, where owner is
a serializer instance and stream is a writing stream function.
.prototype
The prototype for serializers.
.standardTypes
A WeakMap of standard types. The keys are functions which have a
property called "prototype" designating the object which is the
prototype for this standard type, and the values are functions
which are called to implement this standard type. (It does include
RegExp even though that is not in the list of automatics.)
=== Reading/writing contexts ===
The following properties exist on reading/writing context instances or on
the prototype for them. Such instances will be passed as the first
argument to a function for implementing external types.
Most functions will return the value read/written; most will ignore the
argument when it is a writing context. There are some special cases.
.buffer(buf)
Read/write the given buffer (a Node.js Buffer instance) and returns the
buffer. For reading only, it can also be a number which is how many
bytes to read; it returns a new buffer.
.float32(x)
Read/write a 32-bit floating point number.
.float64(x)
Read/write a 64-bit floating point number.
.int16(x)
Read/write a signed 16-bit integer.
.int32(x)
Read/write a signed 32-bit integer.
.int8(x)
Read/write a signed 8-bit integer.
.integer(x)
Read/write a signed 32-bit integer using a variable representation. If
most of the numbers are small but there are some large numbers too, then
this results smaller file size than using int32.
.key(x)
Read/write a key, which is any string or symbol. It will keep track of
any keys previously used to shorten further uses of them, as well as to
use a special case for nonnegative integer keys. If you write int8(0)
and then try to read it with key() you will get null as the result.
.owner
The serializer that this context belongs to.
.properties(obj,omitkeys)
Read/write properties of obj (which must be specified even for reading),
excluding those listed in omitkeys (ignored during reading). The
omitkeys, if specified, is an object whose own keys (the prototype is
ignored) are keys that should not be written (probably because they were
already written by an external type implementation function). Returns
obj (the object whose properties are read/written).
.queue(fn)
Enqueue a function to be executed after the main value is finished. You
can also enqueue during an enqueued function, and it will execute after
all other enqueued functions are finished. This is used in the internal
implementation of serialization of weak sets/maps, although you can also
use it in your own external type implementations. The function enqueued
is not given any arguments.
.reading
True for reading contexts, or false for writing contexts.
.root
Only for writing contexts; it is the root value being serialized.
.stream(buf)
The stream function.
.string(x)
Read/write a string. It does not have to be a valid Unicode text; any
sequence of 16-bit characters can be used.
.uint16(x)
Read/write a unsigned 16-bit integer.
.uint32(x)
Read/write a unsigned 32-bit integer.
.uint8(x)
Read/write a unsigned 8-bit integer.
.value(data)
Serialize or unserialize any value. (Note: The format is different than
using functions like .integer or .string; .value uses a different header
than the other functions (some of which use no header).)
.writing
True for writing contexts, or false for reading contexts.
=== File format ===
For proper specification of file format you must look at the program, and
I am sorry if this document is incomplete or incorrect.
The file starts with a header of two small-endian 16-bit numbers; first
the internal version number and then the user version number. Immediately
after this header is the value to be serialized.
A value is stored as a mode byte, possibly followed by other data
depending on the contents of the mode byte. The mode byte is split in two
nybbles. The high nybble can be:
[0] Short value
No data follows. Low nybble specifies exact value:
0 = undefined
1 = null
2 = false
3 = true
4 = +0
5 = NaN
6 = ""
7 = -0
8 = +Infinity
9 = -Infinity
10 = A new empty array (saved)
11 = +1
12 = +2
13 = +3
14 = A new symbol (saved)
15 = -1
[1] Object (not using external types)
The low nybble specifies what prototype should be used:
0 = null
1 = Use a value that follows
2 = Use default prototype (Object.prototype)
3-15 = An external value
If 1, then another value follows before the property list.
If 3-15, then the external value number modulo 13 is used and is 0-12,
and a varint follows which is 31 less than the quotient (rounded down).
After any extra bytes needed to define the prototype, the property list
follows (described below). The new object is saved before reading
anything else (including the prototype value if applicable).
[2] External value
An external value, identified by a typeid.
[3] Saved value
Access a previously saved object or symbol which has been created during
the unserialization. Identified by a typeid, where 0 means the first
saved value, 1 is the second saved value, and so on.
[4] String of 8-bit characters
A typeid which is one less than the number of characters, followed by
the characters as one byte each.
[5] String of 16-bit characters
A typeid which is one less than the number of characters, followed by
the characters which are each unsigned small-endian 16-bit numbers.
[6] Signed 12-bit integer
A signed 12-bit integer in big-endian format. The low nybble and the
next byte together form the number.
[7] Signed 20-bit integer
A signed 20-bit integer in big-endian format. The low nybble and the
next two bytes together form the number.
[8] Signed 32-bit integer
The low nybble is always zero. Follow by a 32-bit integer in big-endian
format. If the low nybble isn't zero, reading a value in an external
type implementation throws the value of the mode byte; this applies for
both major types 8 and 9.
[9] Floating point number
The low nybble is always zero. Follow by a 64-bit floating point number
in big-endian format.
[10-15] Objects with external types
The external type is identified by a typeid; the typeid is multiplied by
six, and then add the high nybble of the mode byte and subtract ten. The
data that follows depends on the definition of the external type.
A typeid consists of the low nybble of the mode byte and may be followed
by additional bytes. It is always an unsigned integer. If bit3 of the mode
byte is clear then no additional bytes follow; the value is the low 3-bits
of the mode byte. If bit3 is set then there are one or three more bytes
(one if bit2 is clear, three if bit2 is set). If one extra byte then the
low 2-bits of the mode byte is multiply by 256, add the value of the extra
byte, and then add 8 more. If three extra bytes then it is a big-endian
26-bit unsigned integer.
A varint represents any signed 32-bit integer. If bit7 of the first byte
is set then the actual value is the bitwise complement of the rest of the
encoded value. The bit6 and bit5 tell the size of the remaining data, and
the low 5-bits are the low 5-bits of the encoded number. Specification by
bit6 and bit5 of first byte is programmed as follows:
[00]
No more bytes follow.
[01]
Has one byte following, which is eight more bits (bit12-bit5) of the
resulting number.
[10]
Has two bytes following which is a big-endian 16-bit number; multiply by
32 and add to the other number.
[11]
Has three or four bytes following. If the high bit of the first
following byte is set then only three bytes; otherwise all four bytes.
In either case it is the remaining higher bits of the number as
big-endian, but the high bit of the first following byte isn't any part
of it.
A varstring consists of a varint followed by the data. If the number is
positive then it is a length of the string in 8-bit characters. If the
number is negative then the bitwise complement of that number is the
length of the string in small-endian 16-bit characters.
A property list consists of pairs of keys and values, terminated by a zero
byte. A key is encoded as listed below (as hex bytes):
[01-7F] Existing keys
Access one of the first 127 existing keys.
[80-D7] Short numeric
Make a numeric key 0 to 87.
[D8] Long string
Follow by a varstring with is the key. It is saved in the list of
existing keys if the length is nonzero.
[D9] New symbol
Make a new symbol and save it in the list of existing values (not in the
list of existing keys).
[DA] Existing symbol
Follow by a varint which is 31 less than an existing value number; this
value is expected to be a symbol (not an object).
[DB] Long numeric
Follow by a varint. The numeric key is 128 more than that number.
[DC] Symbol from external value
Follow by a varint which is 31 less than an external value number; this
value needs to be a symbol (not an object).
[DD-DF] Long existing keys
Access an existing key. It is accessed by (id-0xDD)+3*(varint+31) where
it is a zero-based existing key number.
[E0-FF] Short string
Make a string of length from 1 to 32 characters and store it in the list
of existing keys if the length isn't 1. String consists of 8-bit
characters only.