jmdict-streaming-parser
v4.2.0
Published
Streaming parser for JMdict and related files.
Downloads
16
Readme
jmdict-streaming-parser
Streaming parser for JMdict and related files.
API
import { createGunzip } from 'zlib'
import { createReadStream } from 'fs'
import { JmdictTransform } from 'jmdict'
import { pipeline } from 'stream'
// Stream style
pipeline(
createReadStream("JMdict.gz"),
createGunzip(),
new JmdictTransform()
).on('data', data => { console.log(data) })
JmdictTransform
class JmdictTransform extends Duplex
A duplex stream that reads XML data and writes plain objects subject to the rules in § Object structure.
transform = new JmdictTransform(opts?: DuplexOptions)
Each object streamed from the transform can have one of the 3 following types. The data itself is stored in the property data
while the type name is stored in the property type
.
type === entity
An object containing keys name
and value
representing entities detected.
type === mdate
The modification date of the file, if detected. String type.
type === entities
The value of transform.entities
when mdate
is encountered.
type === node
Object structure
Each result object is transformed from the source XML.
- Text nodes are transformed into a string value keyed by
$text
.- If the parent XML element only has a text node as its child, the resulting object is collapsed into just a string with the text.
- This exploits the fact that JMdict does not contain mixed text nodes and XML elements.
- Text nodes whose sole content is a newline are ignored.
- If the parent XML element only has a text node as its child, the resulting object is collapsed into just a string with the text.
- XML elements are transformed into an object and appended into an array value in its corresponding parent object where the key is the name of the XML element.
- Attributes of the element are merged into the object.
- Children of the root node are streamed as output.
- Entities are represented by the entity name.
This deliberate generalization is to allow for possible parsing of files similar to the JMdict.
transform.entities
Maps entity names to entity values.