@mimik/dataset-cli
v1.0.0
Published
A CLI tool to parse a PDF file, generate text chunks with embeddings, and save as JSON for Retrieval-Augmented Generation (RAG).
Downloads
3
Readme
@mimik/dataset-cli
A CLI tool to parse a PDF file, generate text chunks with embeddings, and save as JSON for Retrieval-Augmented Generation (RAG).
Installation
To install the tool globally from npm, use the following command:
npm install -g @mimik/dataset-cli
Usage
After installing the tool globally, you can use the dataset-cli command to process a PDF file.
Options
• -i, --input : (required) Path to the input PDF file. • -o, --output : (required) Path to the output MDF file. • -u, --url : (optional) Embedding model URL. Default is http://localhost:8083/api/mim/v1/embeddings. • -k, --apiKey : (optional) API key for the embedding model. • -m, --model : (optional) Embedding model name. Default is "nomic-embed-text-v1.5.Q8_0".
Example Command
dataset-cli -i /path/to/your/input.pdf -o /path/to/your/output.mdf -u http://localhost:1234/v1/embeddings -k your-api-key -m your-model-name