sqlite-simplecrawler-queue
v1.1.0
Published
SQLite FetchQueue Implementation for Simplecrawler
Downloads
18
Readme
SQLite queue for Simplecrawler
This is an implementation of FetchQueue Interface for simplecrawler queue with SQLite usage as backend.
Preferences: Possibility to pause/stop/kill/terminate running job without queue state losing
Installation
Install from github
npm install git+https://github.com/LeMoussel/SQLite-simplecrawler-queue#master
Install from npm
npm install --save SQLite-simplecrawler-queue
Usage
All you need is the database information such as database file
try {
const sqliteDatabaseName = 'crawlsite.sqlite3'
// Drop Database if exist
SQLiteFetchQueue.dropDatabase(sqliteDatabaseName)
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue(sqliteDatabaseName)
// Initialization of the database
crawlerQueue.init()
// Initializing simplecrawler
const crawler = new Crawler('http://example.com')
crawler.maxDepth = 3
crawler.allowInitialDomainChange = false
crawler.filterByDomain = true
crawler.queue = crawlerQueue
crawler.start()
} catch (err) {
console.error(err)
}
Test
npm test
. Check test folder for extra usages.
Additional utilities
- Drop the queue using
dropQueue
method.
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Drop 'queue' table
crawlerQueue.dropQueue
// Initialization of the database
crawlerQueue.init()
- Drop the database using
SQLiteFetchQueue.dropDatabase
static method.
// Drop Database if exist
SQLiteFetchQueue.dropDatabase('sqliteDatabaseName')
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Initialization of the database
crawlerQueue.init()
- Export the flexible queue system to disk in a JSON file.
// Flexible queue system which can be frozen to disk
crawlerQueue.freeze('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
}
console.log(`Number of rows saved to JSON File: ${result}`)
})
- Import from a frozen JSON file on disk.
// Flexible queue system which can be defrosted from disk
crawlerQueue.defrost('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
process.exit(1)
}
console.log(`Number of rows inserted: ${result}`)
})
Resources
License
MIT licensed and all it's dependencies are MIT or BSD licensed.