pdf-figure-extractor
v0.0.10
Published
Extract figure from pdf without text in it
Downloads
6
Readme
Pdf-figure-extractor
Extract figure from pdf without text in it
Required Packages
Install dependencies:
sudo apt-get install libopencv-dev libcv-dev libtesseract-dev tesseract-ocr
Installation
Install project dependencies:
npm install
Run
If you want to execute in command line interface:
npm install -g pdf-figure-extractor
Usage:
Usage: pdf-figure-extractor [options]
Options:
-h, --help output usage information
-V, --version output the version number
-o, --output <path> Directory to put results
-i, --input <path> Directory to process
-t, --tmp <path> Directory to put temporary files
-p, --partials <path> Directory to put figure directory
For instance:
pdf-figure-extractor --input "pdf" --output "output"
If you want to execute as a module:
const pfe = require('pdf-figure-extractor')
const config = {
pdfInputPath: input,
directoryOutputPath: output,
directoryPartialPath: partials,
tmp: tmp,
debug:true
}
new pfe(config).then((self) => {
return self.exec()
}).then((partials)=>{
console.log(partials)
}).catch(err=>console.log(err))
TODO
- ~~Extract array~~
- Extract graphs (partial: heritage from array when graph have grid inside)
- Extract images