lead-scraping

v1.0.1

Published

3 months ago

A web scraping library for extracting phone numbers, emails, social links, and website info.

Downloads

0High
0Medium
0Low

tco

LeadScraper

LeadScraper is an efficient web scraping library for extracting contact information and website details. It's designed to be easy to use while providing robust functionality for gathering leads from websites.

Features

Extract phone numbers, email addresses, and social media links
Gather website information (domain, creation year, platform)
Caching support to improve performance and reduce redundant scraping
TypeScript support for enhanced development experience
Built with performance in mind using got-scraping

Installation

Install LeadScraper using npm:

npm install lead-scraper

bun add lead-scraper

Usage

Basic Usage

import LeadScraper from 'lead-scraper';

const scraper = new LeadScraper();

const url = 'https://example.com';
scraper.scrape(url).then(data => {
  console.log(data);
});

With Caching

const scraper = new LeadScraper({
    cache: true,
    cacheOptions: {
        maxSize: 1000,  // Store up to X items in memory
        maxAge: 7 * 24 * 60 * 60 * 1000  // Cache items for up to 7 days
    }
});

const url = 'https://example.com';
scraper.scrape(url).then(data => {
  console.log(data);
});

// Clear the cache if needed
await scraper.clearCache();

Individual Methods

const scraper = new LeadScraper();

const url = 'https://example.com';
await scraper.loadPage(url);

const emails = scraper.getEmails(url);
const phones = scraper.getPhones(url);
const socials = scraper.getSocials(url);
const websiteInfo = await scraper.getWebsiteInfo(url);

console.log({ emails, phones, socials, websiteInfo });

API

`new LeadScraper(options)`

Creates a new LeadScraper instance.

options.cache (boolean): Enable caching (default: false)
options.expiration (string): Cache expiration time (default: '6m')

`scraper.scrape(url: string): Promise<ScrapeResult>`

Scrapes the given URL for all available information.

`scraper.loadPage(url: string): Promise<boolean>`

Loads the page content. Must be called before using individual methods if not using scrape().

`scraper.getEmails(url: string): string[]`

Returns an array of email addresses found on the page.

`scraper.getPhones(url: string): string[]`

Returns an array of phone numbers found on the page.

`scraper.getSocials(url: string): Record<string, string>`

Returns an object with social media links found on the page.

`scraper.getWebsiteInfo(url: string): Promise<Record<string, string>>`

Returns website information including domain, creation year, and platform.

Caching

When caching is enabled, scraped data is stored locally to avoid redundant scraping of the same URLs. The cache expiration can be set using the following format:

h: Hours (e.g., '24h' for 24 hours)
d: Days (e.g., '7d' for 7 days)
m: Months (e.g., '1m' for 1 month)

License

This project is licensed under the ISC License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Disclaimer

Ensure you have permission to scrape websites and always respect robots.txt files and rate limits. Use this library responsibly and in accordance with the terms of service of the websites you are scraping. sed on your specific implementation or add more examples if needed.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme