@giancosta86/cervantes
v2.0.0
Published
Extract and classify Spanish terms from wiki pages, with TypeScript
Downloads
6
Maintainers
Readme
CervantesJS
Extract and classify Spanish terms from wiki pages, with TypeScript
CervantesJS is a TypeScript library for extracting Spanish terms from wiki pages; even more, it is a plugin for JardineroJS, creating a SQLite dictionary of Spanish terms by parsing Wikcionario.
Installation
To install the package as a plugin, please refer to the documentation of JardineroJS.
The current version of the plugin requires Jardinero 2.x
Otherwise, to install it as a library reference within a project:
npm install @giancosta86/cervantes
or
yarn add @giancosta86/cervantes
The public API entirely resides in the root package index, so you shouldn't reference specific modules.
Usage
CervantesJS is firstly and foremostly a plugin for JardineroJS: please, consult its documentation for details.
However, you can also reference the package as a standalone library for extracting Spanish terms from wiki pages!
In this case, you can just import names directly from its root:
import {...} from "@giancosta86/cervantes"
In particular, you may want to consider:
the
SpanishTerm
union type - and the related types likeNoun
,Article
, ...extractTerms()
- to extract Spanish terms from a given wiki pageSpanishTransform
- a transform stream applyingextractTerms()
to a flow of wiki pagesSPANISH_SQLITE_SCHEMA
: a string containing the DDL code for SQLitecreateSpanishWritableBuilder()
- creating aWritableBuilder
(from the sqlite-writable library) with the required type registrations and with a suitable transaction capacity
Further reference
Please, feel free to explore:
JardineroJS - the web stack itself, designed for extensible linguistic analysis
JardineroJS - SDK - the development kit for creating your own plugins