@ascari/reco
v1.0.1
Published
Generic text classifier for Mexican electronic invoices.
Downloads
11
Maintainers
Readme
reco
A text recognition engine that classifies concept descriptions found in electronic invoices used in Mexico.
Installation
Command Line Utility
You must install reco globally to use the command line interface.
npm i @ascari/reco -g
Module
npm i @ascari/reco --save
USAGE
Command Line Utility
Create reco.json
reco init
A ./reco.json
file will be created with default values.
By default it uses a sqlite3 database.
You may edit the configuration now.
Scaffold a new project
Requires a valid ./reco.json
file to be present.
reco create
A database will be scaffolded in the current directory.
By default it creates a ./database
folder where a sqlite3 database file will be stored.
You may edit the first migration: ./database/migrations/0.js
to better accomodate your database structure.
Keep in mind that the autogenerated tables and columns are required.
NOTE The following commands can only be called after creating a project.
Add a single invoice
Load xml, parse information and store unique: suppliers, clients & concepts found.
reco xml path/to/invoice.xml
Note Ideally, valid SAT invoices should be fed to reco, however reco does not verify its integrity, this means you can feed non-compliant xml invoices as well, as long as they follow a similar structure:
<?xml version="1.0" encoding="utf-8"?>
<Comprobante fecha="{{INVOICE_DATE}}" sello="{{SELLO_DIGITAL}}">
<Emisor rfc="{{CLIENT_RFC}}" name="{{CLIENT_NAME}}" />
<Receptor rfc="{{SUPPLIER_RFC}}" name="{{SUPPLIER_NAME}}" />
<Conceptos>
<Concepto descripcion="{{CONCEPT_DESCRIPTION_A}}" />
<Concepto descripcion="{{CONCEPT_DESCRIPTION_B}}" />
</Conceptos>
</Comprobante>
Add all invoices in a folder
Load and store all invoice files found in a folder.
reco xmls path/to/invoices
Currenly, reco cannot read folders recursively
Label a concept for training
Will create a label that is used to train a classifier.
reco label "LABEL" "CONCEPT"
You may use the --rfc
option to scope a label to a supplier.
reco label "LABEL" "CONCEPT" --rfc XXX0123456X7
Scoping a label improves recognition accuracy. The imporovment comes from weighting higher classifications that belong to a supplier when a recognition test is also scoped to a supplier. The reasoning being that a supplier will generally have their own unique set of concepts for their products and or services, that will more likely match a label scoped to the same supplier.
In other words, a Pizza supplier will tend to better identify concepts with the word "pizza", since its products have the word "pizza" in them, when we are classifying a concept from the Pizza supplier. Otherwise, a Toy supplier with toy pizza games may rank higher.
One more time: When we know a invoice concept comes from a certain supplier, it is better to test it against a classifier that has only been trained on its own invoice concepts and labels from the same supplier.
Add all labels found in a file
Add labels found in a list.
reco labels path/to/labels.lst
Labels are seperated by new lines, where labels and concept are seperated by a (:) colon.
example:
apple:I WANT AN APPLE
orange:ORANGE YOU GLAD?
lemon:EAT SOME LEMON PIE
You may use the -v
option to see progress.
You may use the --rfc
option to scope labels to a supplier.
You may use the --delim
option to specify a different delimeter.
You mau use the --no-delim
option to specify that the list is not delimeted, that is, it does not have a label and a concept, instead the label and the concept are the same.
This is usefull for adding a supplier's catalog, when identifying their concepts.
Train classifiers
Train classifiers.
Be patient, it may take a while.
reco train
Test an arbitrary concept
Test recognition by classifying a specified concept. Will return label with the best score.
reco test "CONCEPT"
You may use the --rfc
option to scope test to a supplier.
You may use the -v
option to return classification information.
10 rows are returned by default, if you specify a number: -v 20
, you can specify how many classification rows to return, ordered from best match to least.
Module
const Reco = require('reco');
// Reco configuration
const recoConfig = { .... };
// instanciate
const reco = new Reco(recoConfig);
API
Reco::contructor(recoConfig);
Where recoConfig can have the follwing options:
Note the database
property is fed to knex.
{
database: {
client: 'sqlite3',
connection: {
filename: './database/database.sqlite'
},
migrations: {
tableName: 'migrations',
directory: './database/migrations',
stub: './database/stub.migration.js'
},
seeds: {
directory: './database/seeds',
},
useNullAsDefault: true,
},
}
Promise Reco::addLabel(String label, String concept, String [supplierRfc=null]);
Add a label.
Promise Reco::addXmlInvoice(String xml);
Add an invoice.
Promise Reco::train();
Train classifiers.
Promise Reco::test(String input, String [supplierRfc=null]);
Test classifiers
License
See LICENSE in respository.