@quantleaf/probly-search

v1.2.4

Published

3 years ago

A lightweight full-text search engine with a fully customizable scoring function

Downloads

0High
0Medium
0Low

marcus-quantleaf

search query bm25 index

probly-search ·

A full-text search library, optimized for insertion speed, that provides full control over the scoring calculations.

This start initially as a port of the Node library NDX.

Demo

Recipe (title) search with 50k documents.

https://quantleaf.github.io/probly-search-demo/

Features

Three ways to do scoring
- BM25 ranking function to rank matching documents. The same ranking function that is used by default in Lucene >= 6.0.0.
- zero-to-one, a library unique scoring function that provides a normalized score that is bounded by 0 and 1. Perfect for matching titles/labels with queries.
- Ability to fully customize your own scoring function by implenting the ScoreCalculator trait.
Trie based dynamic Inverted Index.
Multiple fields full-text indexing and searching.
Per-field score boosting.
Configurable tokenizer and term filter.
Free text queries with query expansion.
Fast allocation, but latent deletion.

Documentation

Documentation is under development. For now read the source tests.

Example

Creating an index with a document that has 2 fields. Query documents, and remove a document.

use std::collections::HashSet;
use probly_search::{
    index::{add_document_to_index, create_index, remove_document_from_index, Index},
    query::{
        query,
        score::default::{bm25, zero_to_one},
        QueryResult,
    },
};


// Create index with 2 fields
let mut index = create_index::<usize>(2);

// Create docs from a custom Doc struct
let doc_1 = Doc {
    id: 0,
    title: "abc".to_string(),
    description: "dfg".to_string(),
};

let doc_2 = Doc {
    id: 1,
    title: "dfgh".to_string(),
    description: "abcd".to_string(),
};

// Add documents to index
add_document_to_index(
    &mut index,
    &[title_extract, description_extract],
    tokenizer,
    filter,
    doc_1.id,
    doc_1.clone(),
);

add_document_to_index(
    &mut index,
    &[title_extract, description_extract],
    tokenizer,
    filter,
    doc_2.id,
    doc_2,
);

// Search, expected 2 results
let mut result = query(
    &mut index,
    &"abc",
    &mut bm25::new(),
    tokenizer,
    filter,
    &[1., 1.],
    None,
);
assert_eq!(result.len(), 2);
assert_eq!(
    result[0],
    QueryResult {
        key: 0,
        score: 0.6931471805599453
    }
);
assert_eq!(
    result[1],
    QueryResult {
        key: 1,
        score: 0.28104699650060755
    }
);

// Remove documents from index
let mut removed_docs = HashSet::new();
remove_document_from_index(&mut index, &mut removed_docs, doc_1.id);

// Vacuum to remove completely
vacuum_index(&mut index, &mut removed_docs);

// Search, expect 1 result
result = query(
    &mut index,
    &"abc",
    &mut bm25::new(),
    tokenizer,
    filter,
    &[1., 1.],
    Some(&removed_docs),
);
assert_eq!(result.len(), 1);
assert_eq!(
    result[0],
    QueryResult {
        key: 1,
        score: 0.1166450426074421
    }
);

Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords