streaming-markov-chain-builder
v1.0.1
Published
A Markov chain builder that accepts input text as a stream and outputs a stream of n-grams.
Downloads
7
Readme
streaming-markov-chain-builder
streaming-markov-chain-builder
is a Markov chain builder that accepts input text as a stream and outputs a stream of n-grams.
Installation
npm i --save streaming-markov-chain-builder
Usage
const { MarkovBuilder } = require('streaming-markov-chain-builder')
const builder = MarkovBuilder({
// number of "context words" to add to each individual word
// defaults to 1
order: 1
// optional - return true if this `word` should be considered 'proper'
// see `src/is-proper.ts` for the default implementation, exported as { isProperFn }
isProperFn: (word) => { return _.isUpperCase(word[0]) }
// optional - given a single line, return a list of sub-sentences
// see `src/sentence-splitter.ts` for the default implementation, exported as { sentenceSplitterFn }
sentenceSplitterFn: (line) => { return line.split(/[\.\?!]/g) }
})
// now, you can start ingesting data by writing it...
builder.write('the quick brown fox jumped over the lazy dog')
// or by streaming it in from a file...
fs.createReadStream('/tmp/corpus.txt').pipe(builder)
// since MarkovBuilder is a Transform stream, you can consume the output by reading from it...
builder.on('data', (ngram) => {
// see below for structure of the `ngram`
})
// you can also pipe the Transform stream to a consumer that accepts object-mode streams
builder.pipe(storage)
Ngram structure
export type MarkovNgram = {
// list of the words in this ngram
ngram: string[],
// is this a sentence starter?
sentenceStart: boolean,
// is this a sentence ender?
sentenceEnd: boolean,
// is ngram[0] proper?
startsWithProper: boolean
}