pdf-parse2
v1.0.4
Published
A pure JavaScript, cross-platform module designed for extracting text from PDF files.
Downloads
137
Maintainers
Readme
PDF Parse
A pure JavaScript, cross-platform module designed for extracting text from PDF files using pdf.js.
Features
- Extract text from PDF files.
- Supports both browser and Node.js environments.
- Easy to use with promise-based API.
Installation
npm install pdf-parse2
Or
yarn add pdf-parse2
Usage
Node.js
const fs = require('fs');
const PDFParse = require('pdf-parse2');
(async () => {
const dataBuffer = fs.readFileSync('path/to/your/document.pdf');
const PDFParse = new PDFParse();
try {
const pdfData = await PDFParse.loadPDF(dataBuffer);
console.log('Text:', pdfData.text);
} catch (error) {
console.error(error);
}
})();
Browser
Ensure you include pdf.js library in your project. You can then use PDFParse
similar to the Node.js example, but with fetching the PDF file using Fetch API or XMLHttpRequest.
API Reference
loadPDF(src, options)
: Loads a PDF file and extracts text.src
can be aBuffer
orArrayBuffer
.options
is optional.renderPage(pageData, options)
: A helper function for rendering a single page. This function is used internally byloadPDF
.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request or open an issue for any bugs or feature requests.
License
This project is licensed under the MIT License - see the LICENSE file for details.