lead-scraping
v1.0.1
Published
A web scraping library for extracting phone numbers, emails, social links, and website info.
Downloads
9
Readme
LeadScraper
LeadScraper is an efficient web scraping library for extracting contact information and website details. It's designed to be easy to use while providing robust functionality for gathering leads from websites.
Features
- Extract phone numbers, email addresses, and social media links
- Gather website information (domain, creation year, platform)
- Caching support to improve performance and reduce redundant scraping
- TypeScript support for enhanced development experience
- Built with performance in mind using got-scraping
Installation
Install LeadScraper using npm:
npm install lead-scraper
bun add lead-scraper
Usage
Basic Usage
import LeadScraper from 'lead-scraper';
const scraper = new LeadScraper();
const url = 'https://example.com';
scraper.scrape(url).then(data => {
console.log(data);
});
With Caching
const scraper = new LeadScraper({
cache: true,
cacheOptions: {
maxSize: 1000, // Store up to X items in memory
maxAge: 7 * 24 * 60 * 60 * 1000 // Cache items for up to 7 days
}
});
const url = 'https://example.com';
scraper.scrape(url).then(data => {
console.log(data);
});
// Clear the cache if needed
await scraper.clearCache();
Individual Methods
const scraper = new LeadScraper();
const url = 'https://example.com';
await scraper.loadPage(url);
const emails = scraper.getEmails(url);
const phones = scraper.getPhones(url);
const socials = scraper.getSocials(url);
const websiteInfo = await scraper.getWebsiteInfo(url);
console.log({ emails, phones, socials, websiteInfo });
API
new LeadScraper(options)
Creates a new LeadScraper instance.
options.cache
(boolean): Enable caching (default: false)options.expiration
(string): Cache expiration time (default: '6m')
scraper.scrape(url: string): Promise<ScrapeResult>
Scrapes the given URL for all available information.
scraper.loadPage(url: string): Promise<boolean>
Loads the page content. Must be called before using individual methods if not using scrape()
.
scraper.getEmails(url: string): string[]
Returns an array of email addresses found on the page.
scraper.getPhones(url: string): string[]
Returns an array of phone numbers found on the page.
scraper.getSocials(url: string): Record<string, string>
Returns an object with social media links found on the page.
scraper.getWebsiteInfo(url: string): Promise<Record<string, string>>
Returns website information including domain, creation year, and platform.
Caching
When caching is enabled, scraped data is stored locally to avoid redundant scraping of the same URLs. The cache expiration can be set using the following format:
h
: Hours (e.g., '24h' for 24 hours)d
: Days (e.g., '7d' for 7 days)m
: Months (e.g., '1m' for 1 month)
License
This project is licensed under the ISC License.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Disclaimer
Ensure you have permission to scrape websites and always respect robots.txt files and rate limits. Use this library responsibly and in accordance with the terms of service of the websites you are scraping. sed on your specific implementation or add more examples if needed.