@quantleaf/probly-search
v1.2.4
Published
A lightweight full-text search engine with a fully customizable scoring function
Downloads
2
Maintainers
Readme
probly-search ·
A full-text search library, optimized for insertion speed, that provides full control over the scoring calculations.
This start initially as a port of the Node library NDX.
Demo
Recipe (title) search with 50k documents.
https://quantleaf.github.io/probly-search-demo/
Features
Three ways to do scoring
- BM25 ranking function to rank matching documents. The same ranking function that is used by default in Lucene >= 6.0.0.
- zero-to-one, a library unique scoring function that provides a normalized score that is bounded by 0 and 1. Perfect for matching titles/labels with queries.
- Ability to fully customize your own scoring function by implenting the
ScoreCalculator
trait.
Trie based dynamic Inverted Index.
Multiple fields full-text indexing and searching.
Per-field score boosting.
Configurable tokenizer and term filter.
Free text queries with query expansion.
Fast allocation, but latent deletion.
Documentation
Documentation is under development. For now read the source tests.
Example
Creating an index with a document that has 2 fields. Query documents, and remove a document.
use std::collections::HashSet;
use probly_search::{
index::{add_document_to_index, create_index, remove_document_from_index, Index},
query::{
query,
score::default::{bm25, zero_to_one},
QueryResult,
},
};
// Create index with 2 fields
let mut index = create_index::<usize>(2);
// Create docs from a custom Doc struct
let doc_1 = Doc {
id: 0,
title: "abc".to_string(),
description: "dfg".to_string(),
};
let doc_2 = Doc {
id: 1,
title: "dfgh".to_string(),
description: "abcd".to_string(),
};
// Add documents to index
add_document_to_index(
&mut index,
&[title_extract, description_extract],
tokenizer,
filter,
doc_1.id,
doc_1.clone(),
);
add_document_to_index(
&mut index,
&[title_extract, description_extract],
tokenizer,
filter,
doc_2.id,
doc_2,
);
// Search, expected 2 results
let mut result = query(
&mut index,
&"abc",
&mut bm25::new(),
tokenizer,
filter,
&[1., 1.],
None,
);
assert_eq!(result.len(), 2);
assert_eq!(
result[0],
QueryResult {
key: 0,
score: 0.6931471805599453
}
);
assert_eq!(
result[1],
QueryResult {
key: 1,
score: 0.28104699650060755
}
);
// Remove documents from index
let mut removed_docs = HashSet::new();
remove_document_from_index(&mut index, &mut removed_docs, doc_1.id);
// Vacuum to remove completely
vacuum_index(&mut index, &mut removed_docs);
// Search, expect 1 result
result = query(
&mut index,
&"abc",
&mut bm25::new(),
tokenizer,
filter,
&[1., 1.],
Some(&removed_docs),
);
assert_eq!(result.len(), 1);
assert_eq!(
result[0],
QueryResult {
key: 1,
score: 0.1166450426074421
}
);
Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.