npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@datafire/google_documentai

v3.0.0

Published

DataFire integration for Cloud Document AI API

Downloads

4

Readme

@datafire/google_documentai

Client library for Cloud Document AI API

Installation and Usage

npm install --save @datafire/google_documentai
let google_documentai = require('@datafire/google_documentai').create({
  access_token: "",
  refresh_token: "",
  client_id: "",
  client_secret: "",
  redirect_uri: ""
});

.then(data => {
  console.log(data);
});

Description

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, translation, and AutoML.

Actions

oauthCallback

Exchange the code passed to your redirect URI for an access_token

google_documentai.oauthCallback({
  "code": ""
}, context)

Input

  • input object
    • code required string

Output

  • output object
    • access_token string
    • refresh_token string
    • token_type string
    • scope string
    • expiration string

oauthRefresh

Exchange a refresh_token for an access_token

google_documentai.oauthRefresh(null, context)

Input

This action has no parameters

Output

  • output object
    • access_token string
    • refresh_token string
    • token_type string
    • scope string
    • expiration string

documentai.projects.locations.processors.humanReviewConfig.reviewDocument

Send a document for Human Review. The input document should be processed by the specified processor.

google_documentai.documentai.projects.locations.processors.humanReviewConfig.reviewDocument({
  "humanReviewConfig": ""
}, context)

Input

  • input object
    • humanReviewConfig required string: Required. The resource name of the HumanReviewConfig that the document will be reviewed with.
    • body GoogleCloudDocumentaiV1beta3ReviewDocumentRequest
    • $.xgafv string (values: 1, 2): V1 error format.
    • access_token string: OAuth access token.
    • alt string (values: json, media, proto): Data format for response.
    • callback string: JSONP
    • fields string: Selector specifying which fields to include in a partial response.
    • key string: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
    • oauth_token string: OAuth 2.0 token for the current user.
    • prettyPrint boolean: Returns response with indentations and line breaks.
    • quotaUser string: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
    • upload_protocol string: Upload protocol for media (e.g. "raw", "multipart").
    • uploadType string: Legacy upload protocol for media (e.g. "media", "multipart").

Output

documentai.projects.locations.operations.get

Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

google_documentai.documentai.projects.locations.operations.get({
  "name": ""
}, context)

Input

  • input object
    • name required string: The name of the operation resource.
    • $.xgafv string (values: 1, 2): V1 error format.
    • access_token string: OAuth access token.
    • alt string (values: json, media, proto): Data format for response.
    • callback string: JSONP
    • fields string: Selector specifying which fields to include in a partial response.
    • key string: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
    • oauth_token string: OAuth 2.0 token for the current user.
    • prettyPrint boolean: Returns response with indentations and line breaks.
    • quotaUser string: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
    • upload_protocol string: Upload protocol for media (e.g. "raw", "multipart").
    • uploadType string: Legacy upload protocol for media (e.g. "media", "multipart").

Output

documentai.projects.locations.list

Lists information about the supported locations for this service.

google_documentai.documentai.projects.locations.list({
  "name": ""
}, context)

Input

  • input object
    • name required string: The resource that owns the locations collection, if applicable.
    • filter string: The standard list filter.
    • pageSize integer: The standard list page size.
    • pageToken string: The standard list page token.
    • $.xgafv string (values: 1, 2): V1 error format.
    • access_token string: OAuth access token.
    • alt string (values: json, media, proto): Data format for response.
    • callback string: JSONP
    • fields string: Selector specifying which fields to include in a partial response.
    • key string: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
    • oauth_token string: OAuth 2.0 token for the current user.
    • prettyPrint boolean: Returns response with indentations and line breaks.
    • quotaUser string: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
    • upload_protocol string: Upload protocol for media (e.g. "raw", "multipart").
    • uploadType string: Legacy upload protocol for media (e.g. "media", "multipart").

Output

documentai.projects.locations.processors.processorVersions.batchProcess

LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.

google_documentai.documentai.projects.locations.processors.processorVersions.batchProcess({
  "name": ""
}, context)

Input

  • input object
    • name required string: Required. The resource name of Processor or ProcessorVersion. Format: projects/{project}/locations/{location}/processors/{processor}, or projects/{project}/locations/{location}/processors/{processor}/processorVerions/{processorVersion}
    • body GoogleCloudDocumentaiV1beta3BatchProcessRequest
    • $.xgafv string (values: 1, 2): V1 error format.
    • access_token string: OAuth access token.
    • alt string (values: json, media, proto): Data format for response.
    • callback string: JSONP
    • fields string: Selector specifying which fields to include in a partial response.
    • key string: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
    • oauth_token string: OAuth 2.0 token for the current user.
    • prettyPrint boolean: Returns response with indentations and line breaks.
    • quotaUser string: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
    • upload_protocol string: Upload protocol for media (e.g. "raw", "multipart").
    • uploadType string: Legacy upload protocol for media (e.g. "media", "multipart").

Output

documentai.projects.locations.processors.processorVersions.process

Processes a single document.

google_documentai.documentai.projects.locations.processors.processorVersions.process({
  "name": ""
}, context)

Input

  • input object
    • name required string: Required. The resource name of the Processor or ProcessorVersion to use for processing. If a Processor is specified, the server will use its default version. Format: projects/{project}/locations/{location}/processors/{processor}, or projects/{project}/locations/{location}/processors/{processor}/processorVerions/{processorVersion}
    • body GoogleCloudDocumentaiV1beta3ProcessRequest
    • $.xgafv string (values: 1, 2): V1 error format.
    • access_token string: OAuth access token.
    • alt string (values: json, media, proto): Data format for response.
    • callback string: JSONP
    • fields string: Selector specifying which fields to include in a partial response.
    • key string: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
    • oauth_token string: OAuth 2.0 token for the current user.
    • prettyPrint boolean: Returns response with indentations and line breaks.
    • quotaUser string: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
    • upload_protocol string: Upload protocol for media (e.g. "raw", "multipart").
    • uploadType string: Legacy upload protocol for media (e.g. "media", "multipart").

Output

Definitions

GoogleCloudDocumentaiUiv1beta3CommonOperationMetadata

  • GoogleCloudDocumentaiUiv1beta3CommonOperationMetadata object: The common metadata for long running operations.
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, RUNNING, CANCELLING, SUCCEEDED, FAILED, CANCELLED): The state of the operation.
    • stateMessage string: A message providing more details about the current state of processing.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiUiv1beta3CreateLabelerPoolOperationMetadata

GoogleCloudDocumentaiUiv1beta3CreateProcessorVersionMetadata

  • GoogleCloudDocumentaiUiv1beta3CreateProcessorVersionMetadata object: The metadata that represents a processor version being created.
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, QUEUED, PREPARING, RUNNING, SUCCEEDED, FAILED, CANCELLING, CANCELLED): The state of the current disable processor operation.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiUiv1beta3DeleteLabelerPoolOperationMetadata

GoogleCloudDocumentaiUiv1beta3DeleteProcessorMetadata

  • GoogleCloudDocumentaiUiv1beta3DeleteProcessorMetadata object: The long running operation metadata for delete processor method.
    • commonMetadata GoogleCloudDocumentaiUiv1beta3CommonOperationMetadata
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, WAITING, RUNNING, SUCCEEDED, FAILED): The state of the current delete processor operation.
    • stateMessage string: A message providing more details about the current state of processing. For example, the error message if the operation is failed.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiUiv1beta3DeleteProcessorVersionMetadata

GoogleCloudDocumentaiUiv1beta3DeployProcessorVersionMetadata

GoogleCloudDocumentaiUiv1beta3DeployProcessorVersionResponse

  • GoogleCloudDocumentaiUiv1beta3DeployProcessorVersionResponse object: Response message for the deploy processor version method.

GoogleCloudDocumentaiUiv1beta3DisableProcessorMetadata

  • GoogleCloudDocumentaiUiv1beta3DisableProcessorMetadata object: The long running operation metadata for disable processor method.
    • commonMetadata GoogleCloudDocumentaiUiv1beta3CommonOperationMetadata
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, WAITING, RUNNING, SUCCEEDED, CANCELLING, CANCELLED, FAILED): The state of the current disable processor operation.
    • stateMessage string: A message providing more details about the current state of processing. For example, the error message if the operation is failed.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiUiv1beta3DisableProcessorResponse

  • GoogleCloudDocumentaiUiv1beta3DisableProcessorResponse object: Response message for the disable processor method. Intentionally empty proto for adding fields in future.

GoogleCloudDocumentaiUiv1beta3EnableProcessorMetadata

  • GoogleCloudDocumentaiUiv1beta3EnableProcessorMetadata object: The long running operation metadata for enable processor method.
    • commonMetadata GoogleCloudDocumentaiUiv1beta3CommonOperationMetadata
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, WAITING, RUNNING, SUCCEEDED, CANCELLING, CANCELLED, FAILED): The state of the current enable processor operation.
    • stateMessage string: A message providing more details about the current state of processing. For example, the error message if the operation is failed.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiUiv1beta3EnableProcessorResponse

  • GoogleCloudDocumentaiUiv1beta3EnableProcessorResponse object: Response message for the enable processor method. Intentionally empty proto for adding fields in future.

GoogleCloudDocumentaiUiv1beta3UndeployProcessorVersionMetadata

GoogleCloudDocumentaiUiv1beta3UndeployProcessorVersionResponse

  • GoogleCloudDocumentaiUiv1beta3UndeployProcessorVersionResponse object: Response message for the undeploy processor version method.

GoogleCloudDocumentaiUiv1beta3UpdateHumanReviewConfigMetadata

GoogleCloudDocumentaiUiv1beta3UpdateLabelerPoolOperationMetadata

GoogleCloudDocumentaiV1beta1BatchProcessDocumentsResponse

  • GoogleCloudDocumentaiV1beta1BatchProcessDocumentsResponse object: Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

GoogleCloudDocumentaiV1beta1BoundingPoly

GoogleCloudDocumentaiV1beta1Document

  • GoogleCloudDocumentaiV1beta1Document object: Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

GoogleCloudDocumentaiV1beta1DocumentEntity

GoogleCloudDocumentaiV1beta1DocumentEntityNormalizedValue

  • GoogleCloudDocumentaiV1beta1DocumentEntityNormalizedValue object: Parsed and normalized entity value.
    • addressValue GoogleTypePostalAddress
    • dateValue GoogleTypeDate
    • datetimeValue GoogleTypeDateTime
    • moneyValue GoogleTypeMoney
    • text string: Required. Normalized entity value stored as a string. This field is populated for supported document type (e.g. Invoice). For some entity types, one of respective 'structured_value' fields may also be populated. - Money/Currency type (money_value) is in the ISO 4217 text format. - Date type (date_value) is in the ISO 8601 text format. - Datetime type (datetime_value) is in the ISO 8601 text format.

GoogleCloudDocumentaiV1beta1DocumentEntityRelation

  • GoogleCloudDocumentaiV1beta1DocumentEntityRelation object: Relationship between Entities.
    • objectId string: Object entity id.
    • relation string: Relationship description.
    • subjectId string: Subject entity id.

GoogleCloudDocumentaiV1beta1DocumentPage

GoogleCloudDocumentaiV1beta1DocumentPageAnchor

  • GoogleCloudDocumentaiV1beta1DocumentPageAnchor object: Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

GoogleCloudDocumentaiV1beta1DocumentPageAnchorPageRef

  • GoogleCloudDocumentaiV1beta1DocumentPageAnchorPageRef object: Represents a weak reference to a page element within a document.
    • boundingPoly GoogleCloudDocumentaiV1beta1BoundingPoly
    • layoutId string: Optional. Deprecated. Use PageRef.bounding_poly instead.
    • layoutType string (values: LAYOUT_TYPE_UNSPECIFIED, BLOCK, PARAGRAPH, LINE, TOKEN, VISUAL_ELEMENT, TABLE, FORM_FIELD): Optional. The type of the layout element that is being referenced if any.
    • page string: Required. Index into the Document.pages element, for example using Document.pages to locate the related page element.

GoogleCloudDocumentaiV1beta1DocumentPageBlock

GoogleCloudDocumentaiV1beta1DocumentPageDetectedLanguage

  • GoogleCloudDocumentaiV1beta1DocumentPageDetectedLanguage object: Detected language for a structural component.
    • confidence number: Confidence of detected language. Range [0, 1].
    • languageCode string: The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

GoogleCloudDocumentaiV1beta1DocumentPageDimension

  • GoogleCloudDocumentaiV1beta1DocumentPageDimension object: Dimension for the page.
    • height number: Page height.
    • unit string: Dimension unit.
    • width number: Page width.

GoogleCloudDocumentaiV1beta1DocumentPageFormField

GoogleCloudDocumentaiV1beta1DocumentPageImage

  • GoogleCloudDocumentaiV1beta1DocumentPageImage object: Rendered image contents for this page.
    • content string: Raw byte content of the image.
    • height integer: Height of the image in pixels.
    • mimeType string: Encoding mime type for the image.
    • width integer: Width of the image in pixels.

GoogleCloudDocumentaiV1beta1DocumentPageLayout

  • GoogleCloudDocumentaiV1beta1DocumentPageLayout object: Visual element describing a layout unit on a page.
    • boundingPoly GoogleCloudDocumentaiV1beta1BoundingPoly
    • confidence number: Confidence of the current Layout within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].
    • orientation string (values: ORIENTATION_UNSPECIFIED, PAGE_UP, PAGE_RIGHT, PAGE_DOWN, PAGE_LEFT): Detected orientation for the Layout.
    • textAnchor GoogleCloudDocumentaiV1beta1DocumentTextAnchor

GoogleCloudDocumentaiV1beta1DocumentPageLine

GoogleCloudDocumentaiV1beta1DocumentPageMatrix

  • GoogleCloudDocumentaiV1beta1DocumentPageMatrix object: Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
    • cols integer: Number of columns in the matrix.
    • data string: The matrix data.
    • rows integer: Number of rows in the matrix.
    • type integer: This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html

GoogleCloudDocumentaiV1beta1DocumentPageParagraph

GoogleCloudDocumentaiV1beta1DocumentPageTable

GoogleCloudDocumentaiV1beta1DocumentPageTableTableCell

GoogleCloudDocumentaiV1beta1DocumentPageTableTableRow

GoogleCloudDocumentaiV1beta1DocumentPageToken

GoogleCloudDocumentaiV1beta1DocumentPageTokenDetectedBreak

  • GoogleCloudDocumentaiV1beta1DocumentPageTokenDetectedBreak object: Detected break at the end of a Token.
    • type string (values: TYPE_UNSPECIFIED, SPACE, WIDE_SPACE, HYPHEN): Detected break type.

GoogleCloudDocumentaiV1beta1DocumentPageVisualElement

GoogleCloudDocumentaiV1beta1DocumentProvenance

  • GoogleCloudDocumentaiV1beta1DocumentProvenance object: Structure to identify provenance relationships between annotations in different revisions.
    • id integer: The Id of this operation. Needs to be unique within the scope of the revision.
    • parents array: References to the original elements that are replaced.
    • revision integer: The index of the revision that produced this element.
    • type string (values: OPERATION_TYPE_UNSPECIFIED, ADD, REMOVE, REPLACE, EVAL_REQUESTED, EVAL_APPROVED): The type of provenance operation.

GoogleCloudDocumentaiV1beta1DocumentProvenanceParent

  • GoogleCloudDocumentaiV1beta1DocumentProvenanceParent object: Structure for referencing parent provenances. When an element replaces one of more other elements parent references identify the elements that are replaced.
    • id integer: The id of the parent provenance.
    • revision integer: The index of the [Document.revisions] identifying the parent revision.

GoogleCloudDocumentaiV1beta1DocumentRevision

  • GoogleCloudDocumentaiV1beta1DocumentRevision object: Contains past or forward revisions of this document.
    • agent string: If the change was made by a person specify the name or id of that person.
    • createTime string: The time that the revision was created.
    • humanReview GoogleCloudDocumentaiV1beta1DocumentRevisionHumanReview
    • id string: Id of the revision. Unique within the context of the document.
    • parent array: The revisions that this revision is based on. This can include one or more parent (when documents are merged.) This field represents the index into the revisions field.
      • items integer
    • processor string: If the annotation was made by processor identify the processor by its resource name.

GoogleCloudDocumentaiV1beta1DocumentRevisionHumanReview

  • GoogleCloudDocumentaiV1beta1DocumentRevisionHumanReview object: Human Review information of the document.
    • state string: Human review state. e.g. requested, succeeded, rejected.
    • stateMessage string: A message providing more details about the current state of processing. For example, the rejection reason when the state is rejected.

GoogleCloudDocumentaiV1beta1DocumentShardInfo

  • GoogleCloudDocumentaiV1beta1DocumentShardInfo object: For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
    • shardCount string: Total number of shards.
    • shardIndex string: The 0-based index of this shard.
    • textOffset string: The index of the first character in Document.text in the overall document global text.

GoogleCloudDocumentaiV1beta1DocumentStyle

  • GoogleCloudDocumentaiV1beta1DocumentStyle object: Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

GoogleCloudDocumentaiV1beta1DocumentStyleFontSize

  • GoogleCloudDocumentaiV1beta1DocumentStyleFontSize object: Font size with unit.
    • size number: Font size for the text.
    • unit string: Unit for the font size. Follows CSS naming (in, px, pt, etc.).

GoogleCloudDocumentaiV1beta1DocumentTextAnchor

  • GoogleCloudDocumentaiV1beta1DocumentTextAnchor object: Text reference indexing into the Document.text.

GoogleCloudDocumentaiV1beta1DocumentTextAnchorTextSegment

  • GoogleCloudDocumentaiV1beta1DocumentTextAnchorTextSegment object: A text segment in the Document.text. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See ShardInfo.text_offset
    • endIndex string: TextSegment half open end UTF-8 char index in the Document.text.
    • startIndex string: TextSegment start UTF-8 char index in the Document.text.

GoogleCloudDocumentaiV1beta1DocumentTextChange

GoogleCloudDocumentaiV1beta1DocumentTranslation

  • GoogleCloudDocumentaiV1beta1DocumentTranslation object: A translation of the text segment.

GoogleCloudDocumentaiV1beta1GcsDestination

  • GoogleCloudDocumentaiV1beta1GcsDestination object: The Google Cloud Storage location where the output file will be written to.
    • uri string

GoogleCloudDocumentaiV1beta1GcsSource

  • GoogleCloudDocumentaiV1beta1GcsSource object: The Google Cloud Storage location where the input file will be read from.
    • uri string

GoogleCloudDocumentaiV1beta1InputConfig

  • GoogleCloudDocumentaiV1beta1InputConfig object: The desired input location and metadata.
    • gcsSource GoogleCloudDocumentaiV1beta1GcsSource
    • mimeType string: Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif. In addition, application/json type is supported for requests with ProcessDocumentRequest.automl_params field set. The JSON file needs to be in Document format.

GoogleCloudDocumentaiV1beta1NormalizedVertex

  • GoogleCloudDocumentaiV1beta1NormalizedVertex object: A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
    • x number: X coordinate.
    • y number: Y coordinate.

GoogleCloudDocumentaiV1beta1OperationMetadata

  • GoogleCloudDocumentaiV1beta1OperationMetadata object: Contains metadata for the BatchProcessDocuments operation.
    • createTime string: The creation time of the operation.
    • state string (values: STATE_UNSPECIFIED, ACCEPTED, WAITING, RUNNING, SUCCEEDED, CANCELLED, FAILED): The state of the current batch processing.
    • stateMessage string: A message providing more details about the current state of processing.
    • updateTime string: The last update time of the operation.

GoogleCloudDocumentaiV1beta1OutputConfig

  • GoogleCloudDocumentaiV1beta1OutputConfig object: The desired output location and metadata.
    • gcsDestination GoogleCloudDocumentaiV1beta1GcsDestination
    • pagesPerShard integer: The max number of pages to include into each output Document shard JSON on Google Cloud Storage. The valid range is [1, 100]. If not specified, the default value is 20. For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If pages_per_shard = 20, then 5 Document shard JSON files each containing 20 parsed pages will be written under the prefix OutputConfig.gcs_destination.uri and suffix pages-x-to-y.json where x and y are 1-indexed page numbers. Example GCS outputs with 157 pages and pages_per_shard = 50: pages-001-to-050.json pages-051-to-100.json pages-101-to-150.json pages-151-to-157.json

GoogleCloudDocumentaiV1beta1ProcessDocumentResponse

GoogleCloudDocumentaiV1beta1Vertex

  • GoogleCloudDocumentaiV1beta1Vertex object: A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
    • x integer: X coordinate.
    • y integer: Y coordinate.

GoogleCloudDocumentaiV1beta2BatchProcessDocumentsResponse

  • GoogleCloudDocumentaiV1beta2BatchProcessDocumentsResponse object: Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

GoogleCloudDocumentaiV1beta2BoundingPoly

GoogleCloudDocumentaiV1beta2Document

GoogleCloudDocumentaiV1beta2DocumentEntity

GoogleCloudDocumentaiV1beta2DocumentEntityNormalizedValue

  • GoogleCloudDocumentaiV1beta2DocumentEntityNormalizedValue object: Parsed and normalized entity value.
    • addressValue GoogleTypePostalAddress
    • dateValue GoogleTypeDate
    • datetimeValue GoogleTypeDateTime
    • moneyValue GoogleTypeMoney
    • text string: Required. Normalized entity value stored as a string. This field is populated for supported document type (e.g. Invoice). For some entity types, one of respective 'structured_value' fields may also be populated. - Money/Currency type (money_value) is in the ISO 4217 text format. - Date type (date_value) is in the ISO 8601 text format. - Datetime type (datetime_value) is in the ISO 8601 text format.

GoogleCloudDocumentaiV1beta2DocumentEntityRelation

  • GoogleCloudDocumentaiV1beta2DocumentEntityRelation object: Relationship between Entities.
    • objectId string: Object entity id.
    • relation string: Relationship description.
    • subjectId string: Subject entity id.

GoogleCloudDocumentaiV1beta2DocumentLabel

  • GoogleCloudDocumentaiV1beta2DocumentLabel object: Label attaches schema information and/or other metadata to segments within a Document. Multiple Labels on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.
    • automlModel string: Label is generated AutoML model. This field stores the full resource name of the AutoML model. Format: projects/{project-id}/locations/{location-id}/models/{model-id}
    • confidence number: Confidence score between 0 and 1 for label assignment.
    • name string: Name of the label. When the label is generated from AutoML Text Classification model, this field represents the name of the category.

GoogleCloudDocumentaiV1beta2DocumentPage

GoogleCloudDocumentaiV1beta2DocumentPageAnchor

  • GoogleCloudDocumentaiV1beta2DocumentPageAnchor object: Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

GoogleCloudDocumentaiV1beta2DocumentPageAnchorPageRef

  • GoogleCloudDocumentaiV1beta2DocumentPageAnchorPageRef object: Represents a weak reference to a page element within a document.
    • boundingPoly GoogleCloudDocumentaiV1beta2BoundingPoly
    • layoutId string: Optional. Deprecated. Use PageRef.bounding_poly instead.
    • layoutType string (values: LAYOUT_TYPE_UNSPECIFIED, BLOCK, PARAGRAPH, LINE, TOKEN, VISUAL_ELEMENT, TABLE, FORM_FIELD): Optional. The type of the layout element that is being referenced if any.
    • page string: Required. Index into the Document.pages element, for example using Document.pages to locate the related page element.

GoogleCloudDocumentaiV1beta2DocumentPageBlock

GoogleCloudDocumentaiV1beta2DocumentPageDetectedLanguage

  • GoogleCloudDocumentaiV1beta2DocumentPageDetectedLanguage object: Detected language for a structural component.
    • confidence number: Confidence of detected language. Range [0, 1].
    • languageCode string: The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

GoogleCloudDocumentaiV1beta2DocumentPageDimension

  • GoogleCloudDocumentaiV1beta2DocumentPageDimension object: Dimension for the page.
    • height number: Page height.
    • unit string: Dimension unit.
    • width number: Page width.

GoogleCloudDocumentaiV1beta2DocumentPageFormField

GoogleCloudDocumentaiV1beta2DocumentPageImage

  • GoogleCloudDocumentaiV1beta2DocumentPageImage object: Rendered image contents for this page.
    • content string: Raw byte content of the image.
    • height integer: Height of the image in pixels.
    • mimeType string: Encoding mime type for the image.
    • width integer: Width of the image in pixels.

GoogleCloudDocumentaiV1beta2DocumentPageLayout

  • GoogleCloudDocumentaiV1beta2DocumentPageLayout object: Visual element describing a layout unit on a page.
    • boundingPoly GoogleCloudDocumentaiV1beta2BoundingPoly
    • confidence number: Confidence of the current Layout within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].
    • orientation string (values: ORIENTATION_UNSPECIFIED, PAGE_UP, PAGE_RIGHT, PAGE_DOWN, PAGE_LEFT): Detected orientation for the Layout.
    • textAnchor GoogleCloudDocumentaiV1beta2DocumentTextAnchor

GoogleCloudDocumentaiV1beta2DocumentPageLine

GoogleCloudDocumentaiV1beta2DocumentPageMatrix

  • GoogleCloudDocumentaiV1beta2DocumentPageMatrix object: Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.
    • cols integer: Number of columns in the matrix.
    • data string: The matrix data.
    • rows integer: Number of rows in the matrix.
    • type integer: This encodes information about what data type the matrix uses. For example, 0 (CV_8U) is an unsigned 8-bit image. For the full list of OpenCV primitive data types, please refer to https://docs.opencv.org/4.3.0/d1/d1b/group__core__hal__interface.html

GoogleCloudDocumentaiV1beta2DocumentPageParagraph

GoogleCloudDocumentaiV1beta2DocumentPageTable

GoogleCloudDocumentaiV1beta2DocumentPageTableTableCell

GoogleCloudDocumentaiV1beta2DocumentPageTableTableRow

GoogleCloudDocumentaiV1beta2DocumentPageToken

GoogleCloudDocumentaiV1beta2DocumentPageTokenDetectedBreak

  • GoogleCloudDocumentaiV1beta2DocumentPageTokenDetectedBreak object: Detected break at the end of a Token.
    • type string (values: TYPE_UNSPECIFIED, SPACE, WIDE_SPACE, HYPHEN): Detected break type.

GoogleCloudDocumentaiV1beta2DocumentPageVisualElement

GoogleCloudDocumentaiV1beta2DocumentProvenance

  • GoogleCloudDocumentaiV1beta2DocumentProvenance object: Structure to identify provenance relationships between annotations in different revisions.
    • id integer: The Id of this operation. Needs to be unique within the scope of the revision.
    • parents array: References to the original elements that are replaced.
    • revision integer: The index of the revision that produced this element.
    • type string (values: OPERATION_TYPE_UNSPECIFIED, ADD, REMOVE, REPLACE, EVAL_REQUESTED, EVAL_APPROVED): The type of provenance operation.

GoogleCloudDocumentaiV1beta2DocumentProvenanceParent

  • GoogleCloudDocumentaiV1beta2DocumentProvenanceParent object: Structure for referencing parent provenances. When an element replaces one of more other elements parent references identify the elements that are replaced.
    • id integer: The id of the parent provenance.
    • revision integer: The index of the [Document.revisions] identifying the parent revision.

GoogleCloudDocumentaiV1beta2DocumentRevision

  • GoogleCloudDocumentaiV1beta2DocumentRevision object: Contains past or forward revisions of this document.
    • agent string: If the change was made by a person specify the name or id of that person.
    • createTime string: The time that the revision was created.
    • humanReview GoogleCloudDocumentaiV1beta2DocumentRevisionHumanReview
    • id string: Id of the revision. Unique within the context of the document.
    • parent array: The revisions that this revision is based o