npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

groq-ocr

v1.0.6

Published

a library to run OCR with Groq provided models.

Downloads

410

Readme

Table of Contents


Disclaimer

This project is still in development‼️

Multi-page PDF support is experimental and work in progress.

PDF support relies on pdftopic library which requires node>=12 and imagemagick.

JSON mode might fail with json_validate_failed error

Installation

npm i groq-ocr to use as an NPM package.

npm i -g groq-ocr to use as a CLI.

Usage

Use as NPM package:

import { ocr, GroqVisionModel } from "groq-ocr";
const result = await ocr({
  filePath: "./filepath.jpg", // Allowed formats: jpg, jpeg, png, pdf.
  apiKey: process.env.GROQ_API_KEY, // Get your API key from https://console.groq.com/
  model: GroqVisionModel.LLAMA_32_90B, // available models: LLAMA_32_11B, LLAMA_32_90B. Default: LLAMA_32_11B
  jsonMode: false, // Default: false. Set to true to get JSON output.
  additionalInstructions: "Additional instructions to be included in the prompt.", // Use to give custom instructions to the model.
});

ocr options:

  • filePath (required): Path to image/PDF file or URL
    • Supported formats: .jpg, .jpeg, .png, .pdf
  • apiKey (optional): Groq API key
    • Defaults to GROQ_API_KEY environment variable
  • model (optional): Vision model to use
    • GroqVisionModel.LLAMA_32_11B (default) - Llama 3.2 11B Vision Preview
    • GroqVisionModel.LLAMA_32_90B - Llama 3.2 90B Vision Preview
  • jsonMode (optional): Return structured JSON instead of markdown
    • Defaults to false
  • additionalInstructions (optional): Additional instructions to be included in the prompt.
    • Defaults to "" - use to give custom instructions to the model.

Use as CLI:

Either set your Groq API key as environment variable:

export GROQ_API_KEY=your-api-key

Or provide it as CLI option with -k flag when running commands.

CLI Examples

# Basic usage
groq-ocr -f image.jpg

# Output as JSON
groq-ocr -f scan.pdf -j

# Save to file
groq-ocr -f receipt.png -o result.txt

# Use specific model and API key
groq-ocr -f document.jpg -m llama-3.2-90b-vision-preview -k your-api-key

CLI Options

  • -f, --file <path> (required): Path to input image/PDF file
  • -k, --api-key <key>: Groq API key (defaults to GROQ_API_KEY env var)
  • -m, --model <model>: Vision model to use:
    • llama-3.2-11b-vision-preview (default)
    • llama-3.2-90b-vision-preview
  • -j, --json: Output in JSON format instead of markdown
  • -o, --output <path>: Write result to file instead of console
  • -V, --version: Display version number
  • -h, --help: Display help information

How it works

This library and CLI uses multimodal models with vision capabilities provided by Groq to run OCR on images and PDFs and return markdown or JSON.

PDFs are converted to images using pdftopic.

Models

The plan is to support all models provided by Groq with vision capabilities. Groq vision models

Currently supported models:

enum GroqVisionModel {
  LLAMA_32_11B = "llama-3.2-11b-vision-preview",
  LLAMA_32_90B = "llama-3.2-90b-vision-preview",
}

Roadmap

  • [x] Add support for local images OCR
  • [x] Add support for remote images OCR
  • [x] Add support for single page PDFs
  • [x] Add support for JSON output in addition to markdown
  • [x] Add CLI
  • [x] extend prompt with custom instructions
  • [ ] Add support for multi-page PDFs OCR (Available but experimental)

Credit

This project was highly inspired by llama-ocr.

Formatted with Biome