sdch
v0.0.3
Published
SDCH encoder/decoder for node.js
Downloads
5
Maintainers
Readme
node-sdch
SDCH encoder/decoder for node.js
Refer to the spec for more information.
Keep in mind, that it is to accurate in all aspects. For instance:
Chromium already supports SDCH-over-HTTPS as it is now considered to not introduce additional risks.
Chromium does not support comma separated port list. Use multiple headers.
Chromium downloads only the first dictionary from
Get_Dictionary
header.
This package mimics Chromium behavior rather than follow the spec precisely, since Chromium is the real consumer of SDCH, not the spec:)
Quick overview.
Based on node-vcdiff. In a nutshell, SDCH adds HTTP layer to VCDIFF compression:
var sdch = require('sdch');
var dict = new sdch.SdchDictionary({
domain: 'kotiki.cc',
path: '/',
data: 'Yo dawg I heard you like common substrings in your documents so we ' +
'put them in your vcdiff dictionary so you can compress while you compress'
});
var testData =
'Yo dawg I heard you like common substrings somewhere else so we put ' +
'them in your vcdiff dictionary so you can decompress while you decompress'
var encoded = sdch.sdchEncodeSync(testData, dict);
var decoded = sdch.sdchDecodeSync(encoded, [ dict ]);
sdch.sdchEncode(testData, dict, function(err, enc) {
sdch.sdchDecode(enc, [ dict ], function(err, dec) {
assert(testData === dec.toString());
});
});
var in = createInputStreamSomehow();
var out = createOutputStreamSomehow();
var encoder = sdch.createSdchEncoder(dict);
in.pipe(encoder).pipe(out);
var decoder = sdch.createSdchDecoder([dict]);
out.pipe(decoder).pipe(process.stdout);
You may want to use connect-sdch which provides all basic server-side stuff required to serve sdch-encoded content.
Slow overview
HTTP Server may provide a dictionary to the client, and the client may use it to decode server responses. Dictionary in SDCH has to be associated with some domain, optionally path and ports, and have some properties. These properties are prepended to a VCDIFF dictionary in HTTP-header format:
domain: kotiki.cc
path: /
port: 80
port: 3000
max-age: 86400
When the client requests the server, it appends client hashes of available
dictionaries (which the client may have downloaded later). The server chooses
the dictionary to decode with and proceeds. This is why SdchDecoder
accepts
the list of dictionaries instead of a single one. Decoder do not know which
particular dictionary server whould choose, it will figure it out only when
parsing the response.
SDCH-encoded entity differs from VCDIFF encoded by dictionary server hash
appended in a front of vcdiff-encoded body. So SDCH encoder just prepends
this hash + '\0'
and then streams VCDIFF-encoded data. The decoder parses
this hash and selects the dictionary from provided and decodes the data.
Well-behaved SDCH client should check a lot of security stuff about the dictionaries proposed by the server, particularly scheme, domain, port, and path match. This package includes util functions to make all these checks (
sdch.clientUtils
). See how connect-sdch example client uses them to validate server provided dictionaries and to choose what to advertise. You may also refer to chromium code for more information.
Here is a quick example of how server and client hashes are created:
var shasum = crypto.createHash('sha256');
shasum.update(/* concatenated SDCH headers + \n */);
shasum.update(/* vcdiff dictionary data */);
var hash = shasum.digest();
var clientHash = urlsafeEncode(hash.slice(0, 6).toString('base64'));
var serverHash = urlsafeEncode(hash.slice(6, 12).toString('base64'));
API Reference
Encoding/Decoding
All encoding/decoding functions accepts options
parameter:
sdchEncodeSync(input, dictionary, options)
sdchDecodeSync(input, dictionaries, options)
sdchEncode(input, dictionary, options, callback)
sdchDecode(input, dictionaries, options, callback)
createSdchEncoder(dictionary, options)
createSdchDecoder(dictionaries, options)
These options will be passed to underlying vcdiff
module. For their meaning,
please refer to node-vcdiff docs.
Note: you should not provide dictionary
or hashedDictionary
, it will be
provided by this module.
For decoder, 2 additional options are available:
url
, String. URL of the resource being decoded.validationCallback
, Function. Will be used to check if the dictionary is valid for decoding of the resource. If not provided, default implementation (usesclientUtil.canUseDictionary
) will be used
Validation callback is used only if url
option is provided:
sdch.sdchDecode(
input,
dictionaries,
{
url: 'http://resource.com/path',
validationCallback: function (dict, // selected dictionary
resourceUrl) { // 'http://resource.com/path'
if (...)
return false;
return true;
}
}, function (err, data) {
if (err) {
....
}
});
Dictionaries
Creation
You may create SDCH dictionary from buffer containing vcdiff dictionary data
(ones generated by femtozip, for instance) and a bunch of SDCH-related options
using SdchDictionary
constructor:
var dict = new sdch.SdchDictionary({
domain: 'kotiki.cc', // String
url: '/dicts/dict1', // String
data = fs.readFileSync('path-to-your-dict'), // Buffer or String
path: '/somepath', // String
formatVersion, '1.0', // String
maxAge: 84600, // Int
ports: [80, 443, 3000], // Array of Ints
});
Only domain
, url
and data
are required, others are optional. The
constructor will throw if required args are missing or any arg has wrong type.
Sometimes you need to parse the dictionary from some source. For instance,
client will parse the dictionary from the server response. Or if you want to
serve dictionaries as a static resources using nginx, you will have to prepare
the files in advance (prepend SDCH headers before data). Then you may create
options
from that data using createDictionaryOptions(dictionaryURL, data)
.
dictionaryURL
is the url where the dictionary is served. For the client it
should be valid web url, including at least scheme and domain parts, since
security checks hash to be performed against it. For the server any url will do
unless you use connect-sdch to serve dictionaries. Then
you need to pass an absolute path from which the dictionary will be served.
Client:
// Client just received dict from some URL
var dict;
try {
var opts = sdch.createDictionaryOptions(
dictUrl, // String
responseBody); // String
dict = sdch.clientUtils.createDictionaryFromOptions(opts);
} catch (e) {
// Whoops... dictionary was invalid for some reason
console.log(e.message);
}
Server:
var dict;
try {
var opts = sdch.createDictionaryOptions(
dictUrl, // String
fs.readFileSync(...)); // String
// On the server we may be sure that the dictionary is more or less correct.
// So create it directly.
dict = new sdch.SdchDictionary(opts);
} catch (e) {
// Whoops... this time this may only mean syntax error in headers, missing
// domain header or no url.
console.log(e.message);
}
SdchDictionary
properties
The class has the following properties (optional params and thie derivatives
may be undefined
):
url
- String. URL from which it was downloaded by the client or on which it is served.domain
- String. Only this domain and its subdomains are allowed to use this dictionarypath
- String. This dictionary will be used for the specified path only or for all subpaths also if path ends with/
. For instance:/
matches all paths/path
matches only/path
/path/
matches/path/123
,path/123/4
, but not/path
ports
- Array of Integers. Usage of the dictionary should be restricted by provided ports.data
- Buffer. VCDiff dictionary data.hashedDict
- vcdiff.HashedDictionary instance for encoding.formatVersion
- version. Chromium supports only'1.0'
.maxAge
- Integer. Dictionary lifetime in seconds.expiration
- Date. Invalidation time for the dictionary. The client should not use the dictionary if the current time is more thanexpiration
.expiration
is set to the time of creation ofSdchDictionary
+maxAge
.clientHash
- String. See above.serverHash
- String. See above.etag
- String. Substring of the SHA256 checksum as well asclientHash
andserverHash
.
methods
getLength()
returns SDCH dictionary length (SDCH headers + dict data). May be used to create content-length header.getOutputStream(opts)
returns dictionary content as a stream.opts
may include any options valid forstream.Readable
. Alsoopts
may includerange: { start: startPos, end: endPos }
. It will be used to stream only a part of the dict (useful for serving content range). Please make sure you pass here a valid range for that dictionary. Use some range parsers to validate it. Multi-ranges are not supported as well.
clientUtils
Set of functions, more or less copy-pasted from the chromium code to aid in creation of SDCH clients to test SDCH servers.
basic client behavior:
- The client willing to accept sdch should advertise it in
Content-Encoding
HTTP header:
Accept-Encoding: sdch,gzip,deflate
- When the client receives response, it should check
Get-Dictionary
header. If it presents, the client may follow url from that header and download the dictionary.
NOTE: Use
canFetchDictionary
to determine if the client should really fetch that dictionary or smth is wrong.
- When the client downloads the dictionary, it parses it and stores somewhere
NOTE: Use
createDictionaryOptions
to parse dictionary params from the response andcreateDictionaryFromOptions
to check if the dictionary is valid and create it if so.
- The client may advertise dictionary to the server in the future requests to
that domain by appending its
clientHash
toAvail-Dictionary
request header.Avail-Dictionary
contain a comma-separated list of client hashes of the dictionaries, the client is willing to advertise to the server. Chromium advertises all available dictioinaries for the domain (plus some checks).
NOTE: Use
canAdvertiseDictionary
to determine if the client should advertise the dictionary to the server.
- If the server chooses to SDCH-encode the resource, it will append
sdch
toContent-Encoding
header. The server may also compress the response with gzip or deflate, do the header will be smth like:
Content-Encoding: sdch, gzip
The client should first decompress the response by gzip/deflate and then pass it do sdch decoder.
NOTE: There is a lot of mess in that place. Some proxies in the wild tend to mess with CE header, so that it may be just
sdch
orgzip
even if the resource id SDCH'ed and gzipped or just SDCH'ed or just gzipped or... you got the idea:) so Chromium tries to perform gzip and sdch decoding for every resource it has advertised the dictionary, despite the values of theContent-Encoding
header, and falls back if fails. Since we're not writing the browser and just want to test our SDCH-server directly, we may skip all that magic and trust the header.
If the server decided not to SDCH-encode the response even if the client has advertised some valid dictionaries, it adds
X-SDCH-Encode: 0
header to the response. The client should be prepared to handle it.
NOTE: after the client has parsed the dictionary server hash from the response it may use
canUseDictionary
to check if dictionary is valid to use.
createDictionaryFromOptions(opts)
Creates dictionary from provided options, parsed from the dictionary using
createDictionaryOptions
if they seem valid. Otherwise throws Error.
Chromium performs this checks after it downloads the dictionary. The dictionary
will not be created if:
opts.domain
isnull
or empty stringopts.domain
specifies TLD (viatldjs
package)opts.url
hostname is notopts.domain
or ifopts.domain
starts with.
andopts.url
hostname does not belong toopts.domain
. Theoretically both of these mean just not belonging to the domain:)if
opts.ports
provided but does not includeopts.url
port. No port means port 80 for http and 443 for https, naturally.if
opts.formatVersion is provided but does not equal to
'1.0'` (this is what chromium does).
canUseDictionary(dictionary, referringUrl)
Determines if the client can really use the dictionary for decoding the
resource (from referringUrl
). This check is performed by the client
(Chromium) when it parses server hash from the first 9 bytes of the response
body and need to verify that dictionary suggested by the server is valid for
that resource.
The function return false
if:
referringUrl
domain does not matchdictionary.domain
dictionary.ports
is present butreferringUrl
port does not match themdictionary.path
is present butreferringUrl
path does not match it (see description of thepath
field).if
dictionary.url
scheme does not matchreferringUrl
one. Simply speaking, you should not use non-secure dictionaries for secure resources and vice versa.
canAdvertiseDictionary(dictionary, targetUrl)
Determines if the client should advertise that dictionary
is available when
it requests targetUrl
.
The function return false
if:
targetUrl
domain does not matchdictionary.domain
dictionary.ports
is present buttargetUrl
port does not match themif
dictionary.url
scheme does not matchtargetUrl
one.dictionary is expired (
current time > dictionary.expiration
).
canFetchDictionary(dictionaryUrl, referringUrl)
Determines if the client should fetch the dictionary located on dictionaryUrl
advertised by the server in the referringUrl
response.
The function return false
if:
referringUrl
anddictionaryUrl
hash different schemes (http vs https)referringUrl
does not equal todictionaryUrl
. Note, that this is exact match.schemes are not
http
orhttps
. This is just additional check which exists in Chromium but not specified in anywhere.