ngramma

v1.1.0

Published

3 years ago

Count the occurrences of a given character set in a given text file with different combinations.

Downloads

0High
0Medium
0Low

bahodursaidov

ngramma unigramma count letters duplicates occurrence search compare match alphabet recursive combination

N-gramma

Count the occurrences of a given character set in a given text file with different combinations. This is used in my PhD researches.

Usage

let g = require('../index')

const n_combination = 1
const textIn = "textIn.txt"
const textOut = "textOut.txt"
const charSet = "abcdefg"

g.ngramma(n_combination,textIn,textOut,charSet,function(results, err){
	console.log(results)
	console.log(err)
})

// terminal/cmd/console output
// Map {
//   'a' => 29,
//   'b' => 3,
//   'c' => 16,
//   'd' => 19,
//   'e' => 38,
//   'f' => 3,
//   'g' => 3 
// }

// file: textIn.txt
// Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
// tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
// quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
// consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
// cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
// proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

// file: textOut.txt
// a: 29,
// b: 3,
// c: 16,
// d: 19,
// e: 38,
// f: 3,
// g: 3

Algorithm

The logic is based on unigramma, bigramma and trigramma which is used in Gamma-Classifier researched by academic Usmanov Zafar Juraevich, Tajik Technical University. ( Гамма классификатор - академик Усманов Зафар Джураевич, ТТУ ).

Important points to note:

The given text file is first cleaned from all other symbols EXCEPT letters.
All words/letters in every line are concatenated to each other.
Case insensive comparison is used.
The new lines in the text are respected (paragraphs are not concatenated). Therefore, in bigramma/trigramma, the results may change with change in paragraphs (new lines, "Enters", paragraphs etc.) in the provided text file.
Combination can be any integer from 1 to 10 BUT for higher combinations take care of RAM overloads.

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

ngramma

v1.1.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

N-gramma

Usage

Algorithm