@tanaloua/links-scraper

v1.0.4

Published

8 months ago

Web scraper for links

Downloads

0High
0Medium
0Low

nekena-tanaloua

Links Scraper

Links Scraper is a Node.js package for crawling web pages and extracting links recursively. It provides a simple and efficient way to collect links from a given website, allowing you to build applications such as web crawlers, site mapping tools, or link analysis tools.

Disclaimer You should always respect the robots.txt file of a website and avoid crawling websites that prohibit web scraping. This package is intended for educational purposes and should be used responsibly.

Installation

You can install Links Scraper via npm:

npm install @tanaloua/links-scraper

Usage

const LinksScraper = require('@tanaloua/links-scraper');

const linksScraper = new LinksScraper();

// Crawl a website and extract links
linksScraper.crawl('https://www.scrapethissite.com').then((links) => {
    console.log(links);
}).catch((error) => {
    console.error('An error occurred:', error);
});

API

`LinksScraper`

`constructor(progressiveRetrieval = false, onProgress)`

progressiveRetrieval: Indicates whether to use progressive retrieval (default: false).
onProgress: Callback function for progressive retrieval.

Creates a new instance of the LinksScraper class.

`crawl(url, ignore)`

Crawls the provided URL and extracts links recursively.

url (String): The URL to crawl.
ignore (String): Optional URL pattern to ignore while crawling.

Returns a Promise that resolves to an array of links found on the website.

Example 1

const linksScraper = new LinksScraper();

linksScraper.crawl('https://www.scrapethissite.com').then((links) => {
    console.log(links);
}).catch((error) => {
    console.error('An error occurred:', error);
});

Example 2

With progressive retrieval.

const onProgress = (url) => {
    console.log('Crawling:', url);
};

const linksScraper = new LinksScraper(true, onProgress);
linksScraper.crawl('https://www.scrapethissite.com').then((links) => {
    console.log(links);
}).catch((error) => {
    console.error('An error occurred:', error);
});

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please read the CONTRIBUTING.md file for details on how to contribute to this project.

Issues

Please report any issues or feature requests on the issues page.

Author

Nekena RATAFITA

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme