@axeptio/links-classifier
v0.0.6
Published
This library is used to filter and classify links inside a web page, works with any language.
Downloads
10
Maintainers
Keywords
Readme
Links Classifier
Use Case
We want to filter links from a given webpage and classify them into different document types, like Privacy Policy, Terms of Service, etc.
Approach
We expose two functions, one for filtering the links, removing external, invalid and duplicate links, and another one for classifying the links into different document types.
Usage
const { filterLinks, classifyLinks, keywords } = require('links-classifier');
const links = document.querySelectorAll('a');
const filteredLinks = filterLinks(
links, // the links to filter
window.location, // the context
['en', 'fr', 'it'], // valid locales (other languages will be ignored)
false, // follow subdomains
console.log // logger function
);
const classifiedLinks = classifyLinks(filteredLinks, keywords, 'fr');
console.log(classifiedLinks);
/*
{
'privacy_policy': Array(2),
'terms_of_service': Array(1),
}
*/
Data
This module imports its own dataset, located in data/keywords.js
, which contains variations for each document type. It is exposed as a symbol from the index, but you are free to use your own dataset.