string-markov-js
v1.3.2
Published
A package for probabilistically generating text using markov chains
Downloads
17
Readme
string-markov-js
A nodejs package for probabilistically generating text using markov chains.
www.npmjs.com/package/string-markov-js
To install, enter the directory of your node package, and type
npm install string-markov-js
Including the module:
var markov = require('string-markov-js');
Creating new training data set
A data set can be trained and can generate text using its training texts. To initialize a new data set, use:
var dataset = markov.newDataSet();
This way, many different datasets can be trained on different texts, and used concurrently.
Training
From a string
var string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit";
var ngram = 2;
var preserveLineBreaks = true;
dataset.trainOnString(string, ngram, preserveLineBreaks);
From a file
var filename = 'training.txt';
var ngram = 3;
var preserveLineBreaks = true;
dataset.trainOnFile(filename, ngram, preserveLineBreaks, function() {
console.log("Training complete.");
});
Line breaks can be preserved to maintain a similar structure to the training corpus (e.g. in the case of poetry), or they can be removed.
If you wish to train on a set of files, trainOnFile
can also take in an array of filenames, as such:
dataset.trainOnFile(['beemoviescript.txt', 'constitution.txt'], 3, true, function() {
console.log("Training complete.");
});
Clearing data
If you wish to remove all training data from a given data set, call:
dataset.clearData();
Generating Text
var startWithCapitalNGram = true;
// generate 100 words of text, beginning with an ngram that was capitalized in the training corpus
var text = dataset.generate(100, startWithCapitalNGram);
The capitalized option allows you to prevent starting the generated text in the middle of a sentence, if the training data is in such a format.
To generate a single complete sentence, use the sentence()
function, which takes in a requested line length, and optionally, a variance in this line length.
var s = dataset.sentence(lineLength, lineLengthVariance);
Ensuring originality
To check whether or not a segment of generated text has accidentally copied the training corpus word-for-word, the checkOriginality
function can be called:
dataset.checkOriginality("Is this string in the training corpus?");
Manually interacting with dataset
If you're looking for more direct interaction with a training set, you can use getPossibilities
to get all the possible words that follow a given gram
dataset.getPossibilities(['words', 'that', 'follow', 'this']);
Or if you want to manually add an entry to the dataset, you can use updateGram
dataset.updateGram(['manually', 'added'], 'ngram');
which will add a new ngram or update a previous one.