fiend
v0.1.0
Published
The most advanced web crawler for JavaScript
Downloads
5
Maintainers
Readme
Fiend
Fiend is still a work in progress. It's planned to be able to:
- Be used via different interfaces:
- [x] API
- [ ] CLI
- Choose a processor for different types of websites:
- [x] Static HTML
- [ ] Dynamic web apps
- Retrieve data in variable formats:
- [x] Raw HTML
- [x] Cheerio
- [ ] A list of links and assets from the page
- Use a queue broker to persist tasks and distribute load:
- [ ] RabbitMQ
- [ ] Redis
- Set resource restrictions:
- [x] Concurrency limit
- [x] Random or static delay
- [x] Timeouts
- [ ] Retries
- Spoof User Agent:
- [x] Custom static
- [x] Random from a custom array
- [x] Random from a predefined array of the most common ones
- [ ] Random from an automatically updated array of the most common ones
- [x] Restrict requests to certain domains
- [ ] Detect and bypass CloudFlare or other protection
- [ ] Force requests to respect robots.txt
- [ ] Schedule tasks and requests
- [ ] Measure performance and memory consumption
- [ ] Log every important event
- [ ] Use proxies
- [ ] Cache responses
- [ ] Authenticate in a site and keep the state between requests
- [ ] Search a website for some info
- [ ] Monitor a page for changes