sbx-parse-api
v0.1.6
Published
JavaScript API for Securibox Parse
Downloads
1
Readme
Securibox Parse - JavaScript API
A Javascript API for Securibox Parse
Community
Securibox Parse JavaScript API is an open source software released under LGPL-3.0 license.
You are welcome to report bugs or create pull requests on github.
Installation
The easiest way to install sbx-parse-api is with npm
.
npm install sbx-parse-api
Alternatively, download the source.
git clone https://github.com/Securibox/parse-api-js.git
Usage
Creating a parser
import {Parse, AuthMethods} from "sbx-parse-api";
// Using JWT authentication
let jwt = "thisIsMyEncodedToken";
authMethod = AuthMethods.JWT;
var parser = new Parse(url, authMethod, jwt);
// OR, with basic authentication:
// user = "MyUsername";
// password = "MySecretPassword";
// authMethod = AuthMethods.BASIC;
// var parser = new Parse(url, authMethod, user, password);
Authentication
Supported authentication methods:
AuthMethods.BASIC
: basic authentication using username and passwordAuthMethods.JWT
: JSON Web Tokens using tokens
API Methods
After you have instanciated a Parse
object, you can use it to call the API. Every call will return a Promise
. Only requests returning a 200 HTTP code will result in a fulfilled promise and trigger the .then()
method; everything else will fall into the .catch()
method and return an error structured as {"error": [Error Object]}
.
The API has four methods:
classify(docs, take=5)
: takes a set of documents and labels them. Internally, the classification is done in two steps: first a fast algorithm returns a list of candidate labels; then a slower high-precision algorithm choses among thetake
most probable labels and determines the document's specific layout. Thetake
optional parameter is a number between 1 and 9 (5 is the default value).parse(docs, take=5, mode=undefined)
: takes a set of documents, classifies and parses them. Along with thetake
parameter (same as inclassify
), it accepts an optionalmode
parameter, that can be one of the following:undefined
(default) - handles every document as it is"split"
- splits the document into pages and handles every page as a separate document.
guess(docs)
: takes a set of partially parsed documents with similar layout and tries to infere the missing data. This method can be used to speed up data entry when theparse
method fails.feed(docs)
: takes a set of documents and stores them for the next training cycles. This method must be used with wrongly classified or wrongly parsed documents after the errors have been corrected by the user; it allows the application to learn and improve over time.
Objects
The docs
object is used on both Requests and Responses. The structure is always an array of the following dictionary:
id
: the document identifier, must be unique in the setbuffer
orbytes
orcontent
: the content of the PDF document. Thebuffer
is waiting for anArrayBuffer
, thebytes
is waiting for an array of bytes while thecontent
is the content of the PDF inbase64
encoding. Only exists on Requests.labelId
(optional): the document label identifierparse()
andclassify()
: if filled, the document will be only layout-classifiedfeed()
: used to train the models- Response: will be filled with the best matching label
detailedLabelId
(optional): the document layout identifierparse()
andclassify()
: if filled, the document will not be classified- Response: will be filled with the best matching layout
extractedData
: the extracted data fields. Array object, every item contains aname
and avalue
field. Returned onparse()
andguess()
, should be filled onfeed()
and (for some documents) onguess()
.errors
: an array containing processing errors for the specific document. Storing errors by document allows you to successfully process the rest of the batch.
Sample
let docs = [];
let doc = {id: "Doc_01", content: "Base64ContentMustGoHere"};
docs.push(doc);
parser.parse(docs).then(function(parsedDocs){
// parsedDocs is an array of documents
alert("The doc contains " + parsedDocs[0].extractedData);
});