lingthing
v0.1.1
Published
Character-level ngram-based language modeling
Downloads
9
Readme
⚡lingthing⚡
A library for n-gram-based character-level language modeling in JavaScript, intended for use in the browser.
A json file containing counts of n-grams in some training corpus can be
created using the script scripts/count_grams.py
(or you can use my
example based on the Lancaster-Oslo/Bergen corpus, in
scripts/LOB_ngrams.json
).
The resulting json data can then be
used, along with the lingthing.log_prob
function, to estimate
the (log) probability of a string (with Laplace smoothing applied, and
maybe other smoothing options in the future if we're lucky).
Installation:
npm install lingthing
Usage Examples:
In Node:
const lt = require('lingthing');
const fs = require('fs');
let counts = JSON.parse(fs.readFileSync('scripts/LOB_ngrams.json'));
test_sentence = "Test sentence."
info = lt.corpus_info(counts)
log_probability = lt.log_prob(test_sentence,counts,"laplace",
info.n,info.d,info.N);
console.log("Probability of sentence '" + test_sentence + "' is "
+ Math.exp(log_probability));
In the browser:
<script src="lingthing-browser-0.0.1.js"></script>
<script src="ngrams.js"></script> <!-- var counts = {
... data generated by scripts/count_grams.py ...};
... or you could load the json data by e.g. XMLHttpRequest -->
<script type="text/javascript">
// Note: importing the browser script is equivalent to:
// var lingthing = require('lingthing');
test_sentence = "Test sentence."
info = lingthing.corpus_info(counts)
log_probability = lingthing.log_prob(test_sentence,counts,"laplace",
info.n,info.d,info.N);
console.log("Probability of sentence '" + test_sentence + "' is "
+ Math.exp(log_probability));
</script>
Build:
To build the browser-friendly distribution, run npm install
to
install dev-dependencies, and then run npm run-script browser
.
The bundled file will appear in the dist
directory.