word-frequency-analyzer
v0.0.1
Published
Given a document as a string, return a list of the most frequently used words sorted by frequency.
Downloads
16
Readme
Word Frequency Analyzer
The word frequency analyzer takes a string of text, parses it into words, and returns a list of words sorted by their frequency in the text. It can support multiple languages, and has several options that can be toggled for determining word matches. Currently the parser only supports english. Additional character sets can be added to enable other languages.
You can alter how words are determined to be the same or significant by the parser. Currently there is support for these modes:
- Case sensitivity
- Filter stop words
- Use root of the word
Multiple modes can be enabled at the same time. This allows for several different possible analyzers depending on your specific needs.
There are several ways to interface with the analyzer:
- Include the npm lib in your project
- HTTP API (uses cluster api to make n workers for n cores)
Programming Interface
Documentation can be viewed here.
Can also be built locally using these steps. The docs define the classes and modules available in the project.
Tools Used
The analyzer was written using Coffeescript and Node.js. Grunt is used for the management of compilation, starting/restarting of services, running of test suites using Mocha and Should.js, and documentation building. Codo is the underlying generator used for documentation.
Provisioning for the service is done using Chef. Vagrant is used with Chef for creating an isolated and replicable working environment. Berkshelf is used for iterating on the chef cookbook and can be re-enabled in the Vagrantfile if needed.
Travis CI is used for continuous integration. It's currently configured for running test suites on new git commits. David-dm is used for version tracking of latest npm modules used in the project. This includes both dev dependencies, and core dependencies.
Installation
The word frequency analyzer uses Vagrant for building an isolated environment with everything necessary for usage or development. Vagrant uses VirtualBox to create VMs programmatically.
$ git clone [email protected]:aslong/word-frequency-analyzer.git
$ cd word-frequency-analyzer
$ vagrant up
At this point you may want to grab a coffee. First run of vagrant up
will need to download a base vm image, and provision the vm with our software dependencies.
Usage
After installation is complete, ssh into our created vm and cd to the directory for the analyzer.
$ vagrant ssh
$ cd word_frequency_analyzer
Running Tests
grunt
is the primary command to use when running the various test suites. The suites are made up of unit and performance tests.
You can run any suite in isolation or all together. There is also a watch mode that can be used to re-run the tests on file updates.
All Test Suites:
$ grunt test
All Test Suites (re-run on file updates):
$ grunt watch:tdd
Unit Test Suite:
$ grunt test:unit
Unit Test Suite (re-run on file updates):
$ grunt watch:unit
Perf Test Suite:
$ grunt test:perf
Perf Test Suite (re-run on file updates):
$ grunt watch:perf
Running Service
Starting:
$ grunt start
Compiles the source, and starts the node.js service.
Restarting:
$ grunt restart
Cleans the bin directory, compiles the source, and starts the node.js service.
Building Documentation
Generate documentation and start doc server:
$ grunt docs
After running, visit here to view the documentation.
Watch and generate new docs on change:
$ grunt watch:docs
Anytime a source file is updated the docs for it will be regenerated. You should only have to refresh your browser to see the updates, assuming you have grunt docs
also running.
Cleaning up VM resources
Pause the VM
$ vagrant suspend
Shutdown the VM
$ vagrant halt
Shutdown and Delete the VM image
$ vagrant destroy
Contributing
If there are any changes or improvements you want to this project, create an issue or fork the project and submit a pull request with the intended change. Please include a description of the feature. Pull requests should have accompanying tests. Thank you for your help with improving this tool for others.
License
(The MIT License)
Copyright (c) 2013 Andrew Long [email protected]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.