markovian-nlp
v7.0.4
Published
NLP tools generate Markov sentences & models.
Downloads
64
Maintainers
Readme
markovian-nlp
Quick start
As an isomorphic JavaScript package, there are multiple ways for clients, servers, and bundlers to start using this library. Several methods do not require installation.
RunKit
RunKit provides one of the least difficult ways to get started:
CodePen
Declare imports in the JS
section to get started:
import {
ngramsDistribution,
sentences,
} from 'https://unpkg.com/markovian-nlp@latest?module';
const sentence = sentences({ document: 'oh me, oh my' });
console.log(sentence);
// example output: 'oh me oh me oh my'
Browsers
Insert the following element within the <head>
tag of an HTML document:
<script src="https://unpkg.com/markovian-nlp@latest"></script>
After the script is loaded, the markovian
browser global is exposed:
const sentence = markovian.sentences({ document: 'oh me, oh my' });
console.log(sentence);
// example output: ['oh me oh me oh my']
Node.js
With npm
installed, run terminal command:
npm i markovian-nlp
Once installed, declare method imports at the top of each JavaScript file they will be used.
ES2015
Recommended
import {
ngramsDistribution,
sentences,
} from 'markovian-nlp';
CommonJS
const {
ngramsDistribution,
sentences,
} = require('markovian-nlp');
Usage
Markov text generation
Generate text sentences from a Markov process.
Potential applications: Natural language generation
Generate sentences
Optionally providing a seed
generates deterministic sentences.
In this example, document
is text from this source:
sentences({
count: 3,
document: 'That there is constant succession and flux of ideas in our minds...',
seed: 1,
});
// output: [
// 'i would promote introduce a constant succession and hindering the path...',
// 'he that train they seem to be glad to be done as may be avoided of our thoughts...',
// 'this wandering of attention and yet for ought i know this wandering thoughts i would promote...',
// ]
View n-grams distribution
View the n-grams distribution of text.
Potential applications: Markov models
ngramsDistribution('birds have featured in culture and art since prehistoric times');
// output: {
// and: { _end: 0, _start: 0, art: 1 },
// art: { _end: 0, _start: 0, since: 1 },
// birds: { _end: 0, _start: 1, have: 1 },
// culture: { _end: 0, _start: 0, and: 1 },
// featured: { _end: 0, _start: 0, in: 1 },
// have: { _end: 0, _start: 0, featured: 1 },
// in: { _end: 0, _start: 0, culture: 1 },
// prehistoric: { _end: 0, _start: 0, times: 1 },
// since: { _end: 0, _start: 0, prehistoric: 1 },
// times: { _end: 1, _start: 0 },
// }
Each number represents the sum of occurrences.
startgram | endgram | bigrams --------- | ------- | ------- "birds" | "times" | all remaining keys ("have featured", "featured in", etc.)
API
ngramsDistribution(document || ngramsDistribution)
ngramsDistribution(Array(document || ngramsDistribution[, ...]))
Input
type | description
---- | -----------
String | document
(corpus or text)
Object | ngramsDistribution
(equivalent to identity
, i.e.: this method's output)
Array[Strings...] | combine multiple document
Array[Objects...] | combine multiple ngramsDistribution
Array[Strings, Objects...] | combine multiple document
and ngramsDistribution
Return value
type | description ---- | ----------- Object | distributions of unigrams to startgrams, endgrams, and following bigrams
// pseudocode signature representation (does not run)
ngramsDistribution(document) => ({
...unigrams: {
...{ ...bigram: bigramsDistribution },
_end: endgramsDistribution,
_start: startgramsDistribution,
},
});
sentences({ distribution || document[, count][, seed] })
Input
user-defined parameter | type | optional | default value | implements | description
---------------------- | ---- | -------- | ------------- | ---------- | -----------
options.count
| Number | true |1
| | Number of sentences to output.
options.distribution
| Object | required if options.document
omitted | | | n-grams distribution used in place of text.
options.document
| String | required if options.distribution
omitted | | compromise(document
) | Text used in place of n-grams distribution.
options.seed
| Number | true | undefined
| Chance(seed
) | Leave undefined
(default) for nondeterministic results, or specify seed
for deterministic results.
Return value
type | description ---- | ----------- Array[Strings...] | generated sentences
Glossary
Learn more about computational linguistics and natural language processing (NLP) on Wikipedia.
The following terms are used in the API documentation:
term | description ---- | --- bigram | 2-gram sequence deterministic | repeatable, non-random endgram | final gram in a sequence n-gram | contiguous gram (word) sequence startgram | first gram in a sequence unigram | 1-gram sequence