spidey-redis
v1.0.4
Published
Spidey with the power of Redis
Downloads
5
Maintainers
Readme
Redis Spidey - Distributed Web Scraping Solution Powered by Redis
RedisSpidey is a powerful tool that combines the capabilities of Spidey and Redis to enable efficient distributed crawling and web scraping. Leveraging the advanced features of Redis, RedisSpidey features a distributed architecture that supports parallel operation of multiple instances, all listening to the same queue. Additionally, RedisSpidey pushes scraped data back to Redis queues for easy distributed post-processing, enhancing the overall efficiency of the scraping process.
Features
- Distributed Crawling: RedisSpidey enables seamless operation of multiple instances of crawlers, all listening to the same queue, for efficient distributed crawling.
- RedisPipeline: RedisSpidey provides support to push crawled data back to Redis queues for distributed post-processing
Installation
npm install spidey-redis
Options
RedisSpidey supports all Spidey options in addition to the following specific options.
| Configuration | Type | Description | Default | Required |
| --- | --- | --- | --- | --- |
| redisUrl
| string
| Redis url such as redis://localhost:6379
| null
| Yes |
| urlsKey
| string
| Redis input queue name such as urls:queue
| null
| Yes |
| dataKey
| string
| Redis output queue name such as data:queue
| null
| Yes if using RedisPipeline |
| sleepDelay
| number
| Wait for new items in queue if empty | 5000ms
| No |
Usage
import { RedisSpidey, RedisPipeline } from 'spidey-redis';
class AmazonSpidey extends RedisSpidey {
constructor() {
super({
// spidey options ...
redisUrl: 'redis://localhost:6379',
// Input queue
urlsKey: 'amazon:urls',
// Output queue
dataKey: 'amazon:data',
// Redis pipeline to push crawled data to data queue
pipelines: [RedisPipeline],
});
}
}
Conclusion
RedisSpidey is the ultimate solution for distributed web scraping and crawling, offering unparalleled performance, scalability, and flexibility. With RedisSpidey, you can easily handle large-scale web scraping tasks with ease, while taking advantage of advanced Redis and Spidey technology for efficient distributed crawling and post-processing of data.
License
Spidey is licensed under the MIT License.