@jimtkelly/aho-corasick
v1.0.1
Published
An implementation of the Aho-Corasick algorithm in TypeScript.
Downloads
113
Maintainers
Readme
About The Project
The Aho—Corasick algorithm is a string-searching algorithm invented by Alfred V. Aho and Margaret J. Corasick in 1975. [1]. This repository contains an implementation of this algorithm in TypeScript for use in efficient string matching as an aid to bibliographic searches.
This package is referenced and inspired from tanishiking's aho-corasick-js package, providing some minor improvements to optimisation and additional functionality.
This package is hosted on npm
and is accessible here.
Built With
Getting Started
Installation
To use this project, please install the required packages as available from npm
.
npm
npm i @jimtkelly/aho-corasick
Usage
Create Trie
and Add Keywords
It's important to note that if the param
, keywords
is an array of empty strings, e.g., [' ', '']
, then an error will be thrown.
const trie = new Trie();
trie.addKeyword('apple');
trie.addKeyword('banana');
Get Matches
const trie = new Trie(['apple', 'banana']);
const matches = trie.getMatches('apple banana');
console.log(matches);
// Output: [{ start: 0, end: 4, keyword: 'apple' }, { start: 6, end: 11, keyword: 'banana' }]
Get Inverse Matches
const trie = new Trie(['cat', 'dog']);
const nonMatches = trie.getNonMatches('The cat and the dog');
console.log(nonMatches);
// Output: ['The ', ' and the ']
Get String Occurrences
const trie = new Trie(['apple', 'banana']);
const occurrences = trie.getStringOccurrences('apple banana apple');
console.log(occurrences);
// Output: [{ keyword: 'apple', occurrences: 2 }, { keyword: 'banana', occurrences: 1 }]
For more examples, please refer to the Documentation
Roadmap
- [x] Base implementation of the Aho-corasick algorithm.
- [x] Add GitHub workflow to test & build the package.
- [x] Implement feature to return
non-matches
, i.e., all words that did not match.- [x] Additional nice to have is to only return the words that matched and their counts, i.e., without position in the string.
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
Jim Kelly - [email protected]
Project Link: https://github.com/jamestkelly/aho-corasick
Acknowledgments
References
[1] "Aho–Corasick algorithm," in Wikipedia: The Free Encyclopedia; (Wikimedia Foundation Inc., updated 11 December 2023, 05:39 UTC) [encyclopedia on-line]; available from https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm; Internet; retrieved 14 December 2023.