pdf-parser-client-side
v1.1.1
Published
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
Downloads
1,468
Maintainers
Readme
PDF Parser Client Side
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
How to Install ?
Use npm or yarn to install this npm package
npm i pdf-parser-client-side
or
yarn add pdf-parser-client-side
Include the package
import extractTextFromPDF from "pdf-parser-client-side";
variant
Parameter
The variant
parameter is used to specify the type of text extraction and replacement to be performed on the extractedText
. Depending on the value of the variant
parameter, different types of characters will be removed or retained.
| variant
Value | Description | Regular Expression | Retained Characters |
| ----------------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |
| clean
| Removes all non-ASCII characters and any spaces that follow them. | /[^\x00-\x7F]+\ \*(?:[^\x00-\x7F] | )\*/g
| ASCII characters only |
| alphanumeric
| Retains only alphanumeric characters (letters and numbers). | /[^a-zA-Z0-9]+/g
| A-Z, a-z, 0-9 |
| alphanumericwithspace
| Retains alphanumeric characters and spaces. | /[^a-zA-Z0-9 ]+/g
| A-Z, a-z, 0-9, space |
| alphanumericwithspaceandpunctuation
| Retains alphanumeric characters, spaces, and basic punctuation marks (.,!?,). | /[^a-zA-Z0-9 .,!?]+/g
| A-Z, a-z, 0-9, space, .,!? |
| alphanumericwithspaceandpunctuationandnewline
| Retains alphanumeric characters, spaces, basic punctuation marks (.,!?), and newlines. | /[^a-zA-Z0-9 .,!?]+/g
| A-Z, a-z, 0-9, space, .,!? |
Example Usage
Javascript
import React from "react";
import extractTextFromPDF from "pdf-parser-client-side";
export default function Test() {
const handleFileChange = async (e, variant) => {
const file = e.target.files?.[0];
if (file) {
try {
const text = await extractTextFromPDF(file, variant);
console.log("Extracted Text:", text);
} catch (error) {
console.error("Error extracting text from PDF:", error);
}
}
};
return (
<div>
<input
type="file"
name=""
id="file-selector"
accept=".pdf"
onChange={(e) => handleFileChange(e, "clean")}
/>
</div>
);
}
Typescript
import React from "react";
import extractTextFromPDF, { Variant } from "pdf-parser-client-side";
export default function Test() {
const handleFileChange = async (
e: React.ChangeEvent<HTMLInputElement>,
variant: Variant
) => {
const file = e.target.files?.[0];
if (file) {
try {
const text = await extractTextFromPDF(file, variant);
console.log("Extracted Text:", text);
} catch (error) {
console.error("Error extracting text from PDF:", error);
}
}
};
return (
<div>
<input
type="file"
name=""
id="file-selector"
accept=".pdf"
onChange={(e) => handleFileChange(e, "clean")}
/>
</div>
);
}
Contributing
Feel free to contribute!
- Fork the repository
- Make changes
- Submit a pull request