groq-ocr
v1.0.6
Published
a library to run OCR with Groq provided models.
Downloads
410
Readme
Table of Contents
Disclaimer
This project is still in development‼️
Multi-page PDF support is experimental and work in progress.
PDF support relies on pdftopic library which requires node>=12 and imagemagick.
JSON mode might fail with json_validate_failed
error
Installation
npm i groq-ocr
to use as an NPM package.
npm i -g groq-ocr
to use as a CLI.
Usage
Use as NPM package:
import { ocr, GroqVisionModel } from "groq-ocr";
const result = await ocr({
filePath: "./filepath.jpg", // Allowed formats: jpg, jpeg, png, pdf.
apiKey: process.env.GROQ_API_KEY, // Get your API key from https://console.groq.com/
model: GroqVisionModel.LLAMA_32_90B, // available models: LLAMA_32_11B, LLAMA_32_90B. Default: LLAMA_32_11B
jsonMode: false, // Default: false. Set to true to get JSON output.
additionalInstructions: "Additional instructions to be included in the prompt.", // Use to give custom instructions to the model.
});
ocr options:
- filePath (required): Path to image/PDF file or URL
- Supported formats:
.jpg
,.jpeg
,.png
,.pdf
- Supported formats:
- apiKey (optional): Groq API key
- Defaults to
GROQ_API_KEY
environment variable
- Defaults to
- model (optional): Vision model to use
GroqVisionModel.LLAMA_32_11B
(default) - Llama 3.2 11B Vision PreviewGroqVisionModel.LLAMA_32_90B
- Llama 3.2 90B Vision Preview
- jsonMode (optional): Return structured JSON instead of markdown
- Defaults to
false
- Defaults to
- additionalInstructions (optional): Additional instructions to be included in the prompt.
- Defaults to "" - use to give custom instructions to the model.
Use as CLI:
Either set your Groq API key as environment variable:
export GROQ_API_KEY=your-api-key
Or provide it as CLI option with -k
flag when running commands.
CLI Examples
# Basic usage
groq-ocr -f image.jpg
# Output as JSON
groq-ocr -f scan.pdf -j
# Save to file
groq-ocr -f receipt.png -o result.txt
# Use specific model and API key
groq-ocr -f document.jpg -m llama-3.2-90b-vision-preview -k your-api-key
CLI Options
-f, --file <path>
(required): Path to input image/PDF file-k, --api-key <key>
: Groq API key (defaults toGROQ_API_KEY
env var)-m, --model <model>
: Vision model to use:llama-3.2-11b-vision-preview
(default)llama-3.2-90b-vision-preview
-j, --json
: Output in JSON format instead of markdown-o, --output <path>
: Write result to file instead of console-V, --version
: Display version number-h, --help
: Display help information
How it works
This library and CLI uses multimodal models with vision capabilities provided by Groq to run OCR on images and PDFs and return markdown or JSON.
PDFs are converted to images using pdftopic.
Models
The plan is to support all models provided by Groq with vision capabilities. Groq vision models
Currently supported models:
enum GroqVisionModel {
LLAMA_32_11B = "llama-3.2-11b-vision-preview",
LLAMA_32_90B = "llama-3.2-90b-vision-preview",
}
Roadmap
- [x] Add support for local images OCR
- [x] Add support for remote images OCR
- [x] Add support for single page PDFs
- [x] Add support for JSON output in addition to markdown
- [x] Add CLI
- [x] extend prompt with custom instructions
- [ ] Add support for multi-page PDFs OCR (Available but experimental)
Credit
This project was highly inspired by llama-ocr.