sbx-parse-api

v0.1.6

Published

7 months ago

JavaScript API for Securibox Parse

Downloads

0High
0Medium
0Low

leandro.luz.securibox

mariavaragilal

joaomrodrigues

Securibox Parse Securibox Parse Document Processing

Securibox Parse - JavaScript API

A Javascript API for Securibox Parse

Community

Securibox Parse JavaScript API is an open source software released under LGPL-3.0 license.

You are welcome to report bugs or create pull requests on github.

Installation

The easiest way to install sbx-parse-api is with npm.

npm install sbx-parse-api

Alternatively, download the source.

git clone https://github.com/Securibox/parse-api-js.git

Usage

Creating a parser

import {Parse, AuthMethods} from "sbx-parse-api";

// Using JWT authentication
let jwt = "thisIsMyEncodedToken";
authMethod = AuthMethods.JWT;
var parser = new Parse(url, authMethod, jwt);

// OR, with basic authentication:
// user = "MyUsername";
// password = "MySecretPassword";
// authMethod = AuthMethods.BASIC;
// var parser = new Parse(url, authMethod, user, password);

Authentication

Supported authentication methods:

AuthMethods.BASIC: basic authentication using username and password
AuthMethods.JWT: JSON Web Tokens using tokens

API Methods

After you have instanciated a Parse object, you can use it to call the API. Every call will return a Promise. Only requests returning a 200 HTTP code will result in a fulfilled promise and trigger the .then() method; everything else will fall into the .catch() method and return an error structured as {"error": [Error Object]}.

The API has four methods:

classify(docs, take=5): takes a set of documents and labels them. Internally, the classification is done in two steps: first a fast algorithm returns a list of candidate labels; then a slower high-precision algorithm choses among the take most probable labels and determines the document's specific layout. The take optional parameter is a number between 1 and 9 (5 is the default value).
parse(docs, take=5, mode=undefined): takes a set of documents, classifies and parses them. Along with the take parameter (same as in classify), it accepts an optional mode parameter, that can be one of the following:
- undefined (default) - handles every document as it is
- "split" - splits the document into pages and handles every page as a separate document.
guess(docs): takes a set of partially parsed documents with similar layout and tries to infere the missing data. This method can be used to speed up data entry when the parse method fails.
feed(docs): takes a set of documents and stores them for the next training cycles. This method must be used with wrongly classified or wrongly parsed documents after the errors have been corrected by the user; it allows the application to learn and improve over time.

Objects

The docs object is used on both Requests and Responses. The structure is always an array of the following dictionary:

id: the document identifier, must be unique in the set
buffer or bytes or content: the content of the PDF document. The buffer is waiting for an ArrayBuffer, the bytes is waiting for an array of bytes while the content is the content of the PDF in base64 encoding. Only exists on Requests.
labelId (optional): the document label identifier
- parse() and classify(): if filled, the document will be only layout-classified
- feed(): used to train the models
- Response: will be filled with the best matching label
detailedLabelId (optional): the document layout identifier
- parse() and classify(): if filled, the document will not be classified
- Response: will be filled with the best matching layout
extractedData: the extracted data fields. Array object, every item contains a name and a value field. Returned on parse() and guess(), should be filled on feed() and (for some documents) on guess().
errors: an array containing processing errors for the specific document. Storing errors by document allows you to successfully process the rest of the batch.

Sample

let docs = [];
let doc = {id: "Doc_01", content: "Base64ContentMustGoHere"};
docs.push(doc);
parser.parse(docs).then(function(parsedDocs){
    // parsedDocs is an array of documents
    alert("The doc contains " + parsedDocs[0].extractedData);
});

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme