@disane-dev/private-ci-test
v1.0.11
Published
Universal and extensible document scraper to get documents from pages (like customer portals etc.)
Downloads
5
Readme
Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)
🏠 Homepage
Prerequisites
- npm >=9.1.2
- node >=18.12.1
Configuration
All settings can be changed via CLI
, env variable (even when using docker).
| Setting | Description | Default value |
|---|---|---|
| AMAZON_USERNAME | Your Amazon username | null
|
| AMAZON_PASSWORD | Your amazon password | null
|
| AMAZON_TLD | Amazon top level domain | de
|
| AMAZON_YEAR_FILTER | Only extracts invoices from this year (i.e. 2023) | 2023
|
| AMAZON_PAGE_FILTER | Only extracts invoices from this page (i.e. 2) | null
|
| AMAZON_ONLY_NEW | Tracks already scraped documents and starts a new run at the last scraped one | true
|
| FILE_DESTINATION_FOLDER | Destination path for all scraped documents | ./documents/
|
| FILE_FALLBACK_EXTENSION | Fallback extension when no extension can be determined | .pdf
|
| DEBUG | Debug flag (sets the loglevel to DEBUG) | false
|
| SUBFOLDER_FOR_PAGES | Creates subfolders for every scraped page/plugin | false
|
| LOG_PATH | Sets the log path | ./logs/
|
| LOG_LEVEL | Log level (see https://github.com/winstonjs/winston#logging-levels) | info
|
| RECURRING | Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default true
when using docker container | false
|
| RECURRING_PATTERN | Cron pattern to execute periodically. Needs RECURRING to true | */30 * * * *
|
| TZ | Timezone used for docker enviroments | Europe/Berlin
|
Install
npm install
Usage
$ npm install -g @disane-dev/private-ci-test
$ docudigger COMMAND
running command...
$ docudigger (--version)
@disane-dev/private-ci-test/1.0.11 linux-x64 node-v18.16.1
$ docudigger --help [COMMAND]
USAGE
$ docudigger COMMAND
...
docudigger scrape all
Scrapes all websites periodically (default for docker environment)
USAGE
$ docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]
FLAGS
-c, --recurringCron=<value> [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath=<value> [default: ./logs/] Log path
-r, --recurring
--logLevel=<option> [default: info] Specify level for logging.
<options: trace|debug|info|warn|error>
GLOBAL FLAGS
--json Format output as json.
DESCRIPTION
Scrapes all websites periodically
EXAMPLES
$ docudigger scrape all
docudigger scrape amazon
Used to get invoices from amazon
USAGE
$ docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
<value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
[--yearFilter <value>] [--pageFilter <value>] [--onlyNew]
FLAGS
-c, --recurringCron=<value> [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath=<value> [default: ./logs/] Log path
-p, --password=<value> (required) Password
-r, --recurring
-t, --tld=<value> [default: de] Amazon top level domain
-u, --username=<value> (required) Username
--fileDestinationFolder=<value> [default: ./data/] Amazon top level domain
--fileFallbackExentension=<value> [default: .pdf] Amazon top level domain
--logLevel=<option> [default: info] Specify level for logging.
<options: trace|debug|info|warn|error>
--onlyNew Gets only new invoices
--pageFilter=<value> Filters a page
--yearFilter=<value> Filters a year
GLOBAL FLAGS
--json Format output as json.
DESCRIPTION
Used to get invoices from amazon
Scrapes amazon invoices
EXAMPLES
$ docudigger scrape amazon
Docker
docker run \
-e AMAZON_USERNAME='[YOUR MAIL]' \
-e AMAZON_PASSWORD='[YOUR PW]' \
-e AMAZON_TLD='de' \
-e AMAZON_YEAR_FILTER='2020' \
-e AMAZON_PAGE_FILTER='1' \
-e LOG_LEVEL='info' \
-v "C:/temp/docudigger/:/home/node/docudigger" \
ghcr.io/disane87/docudigger
Dev-Time 🪲
NPM
npm install
[Change created .env for your needs]
npm run start
Author
👤 Marco Franke
- Website: http://byte-style.de
- Github: @Disane87
- LinkedIn: @marco-franke-799399136
🤝 Contributing
Contributions, issues and feature requests are welcome!Feel free to check issues page. You can also take a look at the contributing guide.
Show your support
Give a ⭐️ if this project helped you!
This README was generated with ❤️ by readme-md-generator