pdf-text-tools
v0.0.0-development
Published
Tools to process text from pdfs for splitting, etc for use with AI and LLMs
Downloads
6
Readme
pdf-text-tools
A bunch of tools to help with processing text from a pdf, for use with LLMs. For example, finding headers, splitting text at headers, etc. Particularly useful for processing pages of text from a pdf, where the text is not structured in a way that is easy to process. and
Install
npm install pdf-text-tools
Usage
/**
* Find header titles in a pdf using regex ish
*/
import { findHeaderTitles } from 'pdf-text-tools';
findHeaderTitles('..some text string from pdf..');
//=> ['header1', 'header2']
/**
* Split text at header titles
* - Usefull to grab the last bit of a page
*/
import { splitAtHeader } from 'pdf-text-tools';
splitAtHeader('..some text string from pdf..', "last");
//=> ['text before the header', 'text after the heading, including the header']