fm-index.jsx
v0.3.0
Published
FM-index is the fastest full text search algorithm using a compressed index file. This is FM-index for JSX/JS/AMD/Common.js.
Downloads
9
Readme
fm-index.jsx
Synopsis
FM-index is the fastest full text search algorithm using a compressed index file. This is FM-index for JSX/JS/AMD/Common.js.
Motivation
FM-index is the alternate search algorithm of an inverse index algorithm. FM-index has the following advantages:
- It doesn't need to split word (like N-gram). It is good for CJK languages.
- It can recreate original document from the index file
- Index file is compressed.
- Easy to control the performance and the index file size.
Code Example
Use from JSX
import "fm-index.jsx";
class _Main {
static function main(argv : string[]) : void
{
var fm = new FMIndex();
fm.push("hello");
fm.push("world");
this.fm.build(5);
console.log(this.fm.search('world')); // -> [5]
}
}
Use from node.js
var FMIndex = require('fm-index.common.js').FMIndex;
Use from require.js
// use fm-index.amd.js
define(['fm-index.jsx'], function (fmindex) {
var fmindex = fmindex.FMIndex();
// Write simple usage here!
});
Use via standard JSX function
<script src="fm-index.js}}" type="text/javascript"></script>
<script type="text/javascript">
window.onload = function () {
var FMIndex = JSX.require("src/fm-index.js").FMIndex;
});
</script>
Use via global variables
<script src="fm-index.global.js}}" type="text/javascript"></script>
<script type="text/javascript">
window.onload = function () {
var fmindex = new FMIndex();
});
</script>
Installation
$ npm install fm-index.jsx
You should add the following modules to package.json
if you want to use from JSX:
- burrows-wheeler-transform.jsx
- wavelet-matrix.jsx
- binary-io.jsx
- binary-support.jsx
API Reference
class FMIndex()
Constructor.
FMIndex.push(str : string) : void
Append string.
FMIndex.contentSize()
Return total length of pushed string. It is available before build()
.
FMIndex.build(ddic : int, maxChar : int = 65535) : void
Build search index. ddic
is a cache density. (1 / ddic) * 100
% is a actual cache rate.
If ddic == 1, densty = 100%, it provides maximum speed but it use match memory and storage.
Initial recommendation value is 50.
maxChar
is a maximum character code. If you reduce this, you can save memory.
FMIndex.size()
Return contetn size. It is available after build()
.
FMIndex.search(keyword : string) : int[]
Return position list that includes keyword
.
FMIndex.getSubstring(pos : int, len : int) : string
Return original document content
FMIndex.dump(output : BinaryOutput) : void
Export bit-vector.
FMIndex.load(input : BinaryInput) : void
Import bit-vector.
Development
JSX
Don't be afraid JSX! If you have an experience of JavaScript, you can learn JSX quickly.
- Static type system and unified class syntax.
- All variables and methods belong to class.
- JSX includes optimizer. You don't have to write tricky unreadalbe code for speed.
- You can use almost all JavaScript API as you know. Some functions become static class functions. See reference.
Setup
To create development environment, call following command:
$ npm install
Repository
- Repository: git://github.com/shibukawa/fm-index.jsx.git
- Issues: https://github.com/shibukawa/fm-index.jsx/issues
Run Test
$ grunt test
Build
$ grunt build
Generate API reference
$ grunt doc
Author
- shibukawa / [email protected]
License
MIT
Complete license is written in LICENSE.md
.