@warren-bank/node-hocr-resizer
v1.0.1
Published
Command-line utility for resizing the coordinates in an hOCR (html-formatted ocr text) file.
Downloads
3
Readme
hocr-resizer
Command-line utility for resizing the coordinates in an hOCR (html-formatted ocr text) file.
Use Case:
- lets say that we've already done the following:
- what if:
- the size of the PDF is too big
- we can:
- use ImageMagick to resize all images to a lower resolution
- the problem:
- the coordinates in the hOCR files no-longer correspond to the dimensions of the images
- the aspect ratio hasn't changed
- the width and height of the images (in pixels) have decreased
- the coordinates in the hOCR files no-longer correspond to the dimensions of the images
Existing Solution:
- this problem has been discussed in an issue in the hocr-tools repo
- as mentioned in an issue comment
- hocr_resizer.rb is a Ruby script that can rescale the coordinates in a hOCR file
- hocr_resize.rake is a Ruby rakefile to call the script
Reason for Yet-Another Solution:
- I fkn hate Ruby, and its enormous non-portable runtime
- ..why not?
Installation:
npm install --global @warren-bank/node-hocr-resizer
Usage:
hocr-resizer <options>
options:
========
"--help"
Print a help message describing all command-line options.
"-v"
"--version"
Display the version.
"-w" <integer>
"--width" <integer>
[required] Width of new/resized image.
"-h" <integer>
"--height" <integer>
[optional] Height of new/resized image.
Default: calculated from old aspect ratio and new/resized width.
"-i" <filepath>
"--input" <filepath>
[required] Filepath to input hOCR file.
"-o" <filepath>
"--output" <filepath>
[optional] Filepath to output hOCR file.
Default: overwrite input hOCR file.
Example:
- overwrite an hOCR with updated coordinates based on the same aspect ratio and a new image width of 1275px (ie: 150dpi @ 8.5"):
hocr-resizer -w 1275 -i '/path/to/file.hocr'
Legal:
- copyright: Warren Bank
- license: GPL-2.0