n8n-nodes-scan-docx

v0.1.0

Published

3 months ago

The Auto Scan DOCX and Image node scans DOCX and image files, extracts text via OCR, and classifies the content. Notifications can be sent after processing.

Downloads

161

0High
0Medium
0Low

tuananhit1612

n8n-community-node-package scan docx

Auto Scan DOCX and Image Node for n8n

The Auto Scan DOCX and Image node for n8n allows you to automatically scan DOCX or image files (using Optical Character Recognition, OCR), extract relevant data, and classify the contents. This node supports both DOCX file processing and image-based OCR scanning, providing flexibility for document automation workflows.

Features:

OCR for Image Files: Automatically scan and extract text from image files using OCR.
DOCX Extraction: Extract text content from DOCX files for further processing.
Document Classification: Classify extracted content into predefined categories such as department and priority.
Notification Support: Optionally send notifications (e.g., via email, SMS, or print) once processing is complete.

Installation

To install this custom node, follow these steps:

Clone or download this repository.
Follow the official n8n custom node installation guide.
Install the necessary dependencies for OCR and DOCX extraction:
- For OCR: Ensure Tesseract.js is installed and properly configured.
- For DOCX extraction: Install Mammoth.js.

Configuration

Node Settings

Once you add the Auto Scan DOCX and Image node to your workflow, you can configure the following parameters:

Input Type (options):
- Choose between Image (for OCR) or DOCX (for document extraction).
- Default: docx
- Description: Select the input type for scanning (Image or DOCX file).
File URL or Path (string):
- Provide the URL or local file path to the document or image file you wish to process.
- Default: ``
- Description: The path or URL to the file.
Language (options):
- English (eng) or Vietnamese (vie).
- Default: eng
- Description: The language to use for OCR processing on image files.
Send Notification (boolean):
- Determines whether to send a notification after the document processing is complete.
- Default: false
- Description: If set to true, a notification will be sent after processing is finished.
Output Format (options):
- Choose between JSON or Plain Text for the output format of the extracted data.
- Default: json
- Description: Choose the output format for the extracted data.
Department Routing (boolean):
- Automatically route the document to the correct department based on the extracted content.
- Default: true
- Description: If set to true, the node will classify the document and route it to the appropriate department.
Notification Method (options):
- Choose the method to notify users or departments about the document status after processing.
- Options: Email, SMS, or Print.
- Default: email
- Description: Select the notification method for alerting users or departments.

Example Workflow

Input Data:

{
  "documentUrl": "https://example.com/document.docx",
  "documentType": "docx",
  "outputFormat": "json",
  "departmentRouting": true,
  "notificationMethod": "email"
}

Node Configuration:

Document URL: https://example.com/document.docx
Document Type: docx
Output Format: json
Department Routing: true
Notification Method: email

Output Data (Example):

If the document is a DOCX file, the output might look like this:

{
  "extractedText": "This document contains financial data that needs to be routed to the finance department.",
  "classifiedData": {
    "department": "Finance",
    "priority": "High",
    "summary": "Extracted key financial information."
  },
  "notification": "Notification sent via email."
}

In this example:

extractedText: The raw text extracted from the document or image.
classifiedData: A summary of the classification (e.g., department, priority).
notification: A message indicating that a notification was sent.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme