rdn-naive-bayes
v0.1.3
Published
Implements naive bayes classifier
Downloads
4
Readme
CS 5860 - Naive Bayes Classifier
Ross Nordstrom
University of Colorado - Colorado Springs
CS 5860 - Machine Learning
Assignment
Write a program in a language of your choice that classifies datasets into two classes. The two classes here are Charles Dickens and Thomas Hardy.
Assignment Details
Dataset
In addition to the required Dickens and Hardy books, some additional datasets were taken from UCI - Machine Learning Repository. The datasets used are described below.
Datasets used, and their location in this project:
Dataset | Source | Path | Type * ---|---|---|--- SMS | UCI - SMS Spam Collection | ./data/sms | inline Badges | UCI - Badges | ./data/badges | inline Main | Gutenberg - Dickens, Hardy | ./data/main | gutenberg
Dataset Types: *
Type | Description ---|--- inline | Dataset is stored as a single file in which each line represents a training point. The first word in each line is the class/category, while the rest of the line is a list of words used as the training "text blob." gutenberg | Dataset is stored as a list of directories representing classes/categories (e.g. "dickens", "hardy"). Each file within the class directories represent a training point. These files are actually books, but are abstractly considered to be "text blobs," just like the inline dataset type.
Usage
This project is intended to be used via the CLI, and is exposed as an NPM package.
Installation
From NPM:
npm install -g rdn-naive-bayes
From local:
git clone [email protected]:ross-nordstrom/cs5860-naive_bayes.git
cd cs5860-naive-bayes
npm install
npm link
Running
View Usage: Rather than document the usage here, please see the tool's help documentation. In general, the tool expects to be given a dataset which it will divide into Training/Testing data.
rdn-naive-bayes -h
Testing
npm install
npm test