npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

n8n-nodes-scan-docx

v0.1.0

Published

The Auto Scan DOCX and Image node scans DOCX and image files, extracts text via OCR, and classifies the content. Notifications can be sent after processing.

Downloads

86

Readme

Auto Scan DOCX and Image Node for n8n

The Auto Scan DOCX and Image node for n8n allows you to automatically scan DOCX or image files (using Optical Character Recognition, OCR), extract relevant data, and classify the contents. This node supports both DOCX file processing and image-based OCR scanning, providing flexibility for document automation workflows.

Features:

  • OCR for Image Files: Automatically scan and extract text from image files using OCR.
  • DOCX Extraction: Extract text content from DOCX files for further processing.
  • Document Classification: Classify extracted content into predefined categories such as department and priority.
  • Notification Support: Optionally send notifications (e.g., via email, SMS, or print) once processing is complete.

Installation

To install this custom node, follow these steps:

  1. Clone or download this repository.
  2. Follow the official n8n custom node installation guide.
  3. Install the necessary dependencies for OCR and DOCX extraction:
    • For OCR: Ensure Tesseract.js is installed and properly configured.
    • For DOCX extraction: Install Mammoth.js.

Configuration

Node Settings

Once you add the Auto Scan DOCX and Image node to your workflow, you can configure the following parameters:

  • Input Type (options):

    • Choose between Image (for OCR) or DOCX (for document extraction).
    • Default: docx
    • Description: Select the input type for scanning (Image or DOCX file).
  • File URL or Path (string):

    • Provide the URL or local file path to the document or image file you wish to process.
    • Default: ``
    • Description: The path or URL to the file.
  • Language (options):

    • English (eng) or Vietnamese (vie).
    • Default: eng
    • Description: The language to use for OCR processing on image files.
  • Send Notification (boolean):

    • Determines whether to send a notification after the document processing is complete.
    • Default: false
    • Description: If set to true, a notification will be sent after processing is finished.
  • Output Format (options):

    • Choose between JSON or Plain Text for the output format of the extracted data.
    • Default: json
    • Description: Choose the output format for the extracted data.
  • Department Routing (boolean):

    • Automatically route the document to the correct department based on the extracted content.
    • Default: true
    • Description: If set to true, the node will classify the document and route it to the appropriate department.
  • Notification Method (options):

    • Choose the method to notify users or departments about the document status after processing.
    • Options: Email, SMS, or Print.
    • Default: email
    • Description: Select the notification method for alerting users or departments.

Example Workflow

Input Data:

{
  "documentUrl": "https://example.com/document.docx",
  "documentType": "docx",
  "outputFormat": "json",
  "departmentRouting": true,
  "notificationMethod": "email"
}

Node Configuration:

  • Document URL: https://example.com/document.docx
  • Document Type: docx
  • Output Format: json
  • Department Routing: true
  • Notification Method: email

Output Data (Example):

If the document is a DOCX file, the output might look like this:

{
  "extractedText": "This document contains financial data that needs to be routed to the finance department.",
  "classifiedData": {
    "department": "Finance",
    "priority": "High",
    "summary": "Extracted key financial information."
  },
  "notification": "Notification sent via email."
}

In this example:

  • extractedText: The raw text extracted from the document or image.
  • classifiedData: A summary of the classification (e.g., department, priority).
  • notification: A message indicating that a notification was sent.