wwweb
v1.0.3
Published
An autonomous webcrawler for indexing robots.txt files.
Downloads
15
Maintainers
Readme
WWWEB
An autonomous webcrawler for indexing robots.txt files.
Requirements
- node.js ^6.0.0
Usage
wwweb -d <domain> [-s <interval>] -o <directory> [--rest <seconds>] [[-v] -v] [-t <timeout>]
Options
| Flag | Alias | Description | Info |
|:--|:--|:--|:--|
| --domain
| -d
| Initial domain | required |
| --save-interval
| -s
| Interval in seconds for outputting reports | default: 30 |
| --output
| -o
| Name of the output directory | required |
| --help
| -h
| Show help | |
| --rest
| -r
| Seconds to rest between requests | default: 0 |
| --timeout
| -t
| Milliseconds before a request times out | default: 15000 |
| --verbose
| -v
| Verbose output of what is going on | -vv for debug output |
| --no-color
| | Disable colorful output | |
Examples
Crawl from example.org and output files to the current working directory:
wwweb -d example.org -o .
Crawl from example.org, output files to ./reports/, output warning, wait eight seconds for files to load and save a report every minute.
wwweb -d example.org -o reports/ -v -t 8000 -s 60