eshop-scraper
v1.4.1
Published
A powerful npm package designed for web scraping e-commerce websites.
Downloads
26
Maintainers
Readme
Eshop Scraper (eshop-scraper)
Eshop Scraper is a powerful npm package designed for web scraping e-commerce websites.
Installation
To install the package, use one of the following commands:
npm install eshop-scraper
pnpm add eshop-scraper
yarn add eshop-scraper
What it does
This package allows you to extract important data such as price, currency, and name from various well-known e-commerce websites, including Amazon, Steam, Ebay, and many others. It facilitates efficient web scraping for obtaining detailed product information.
Support
{
"node": ">=20.11.0",
"npm": ">=10.2.4",
}
Getting Started
Create an Instance of EshopScraper
First, you need to create an instance of the EshopScraper
class. Configure it with optional parameters as needed:
import { EshopScraper, ResultData } from 'eshop-scraper';
const scraper: EshopScraper = new EshopScraper({
timeout: 15, // Timeout for requests in seconds
// Additional configuration options
});
Use getData
Method to Scrape Data
Call the getData
method to scrape data from the provided URL:
import { EshopScraper, ResultData } from 'eshop-scraper';
const scraper = new EshopScraper({
timeout: 15,
});
(async () => {
try {
const result: ResultData = await scraper.getData('https://example.com/product-page');
if (result.isError) {
console.error('Error:', result.errorMsg);
} else {
console.log('Product Data:', result);
}
} catch (error) {
console.error('Unexpected Error:', error);
}
})();
Methods
getData
This method scrapes data from a website based on the provided configuration.
Parameters:
The method takes a single parameter:
link: string
: The absolute URI of the item you want to scrape.timeoutAmount?: number
: Timeout amount for the request in seconds.
Usage:
await scraper.getData(uri);
Output:
It returns a Promise that resolves to an object with the following structure:
{
price?: number; // The price of the product
currency?: string; // The currency of the price
name?: string; // The name of the product
site?: string; // The source website's name
link?: string; // The link to the product page
isError?: boolean; // Whether an error occurred
errorMsg?: string; // The error message, if any
}
updateCurrencyMap
Updates entries in the _currencyMap
.
Parameters:
key: string[][] | string[]
: The key(s) to be updated.value: string[] | string
: The value(s) to be assigned.
Usage:
scraper.updateCurrencyMap([['$', 'usd']], 'USD');
scraper.updateCurrencyMap(['$', 'usd'], 'USD');
deleteCurrencyMap
Deletes entries from the _currencyMap
.
Parameters:
key: string[][] | string[]
: The key(s) to be deleted.
Usage:
scraper.deleteCurrencyMap([['$', 'usd']]);
scraper.deleteCurrencyMap(['$', 'usd']);
updateWebProps
Updates entries in the _webProps
.
Parameters:
site: string | string[]
: The site(s) to be updated.properties: { site: string; selector: { price: string[]; name: string[] } } | { site: string; selector: { price: string[]; name: string[] } }[]
: The properties to be assigned.
Usage:
scraper.updateWebProps('exampleSite', { site: 'exampleSite', selector: { price: ['priceSelector'], name: ['nameSelector'] } });
scraper.updateWebProps(['site1.com', 'site2.com'], [
{ site: 'site1', selector: { price: ['priceSelector1'], name: ['nameSelector1'] } },
{ site: 'site2', selector: { price: ['priceSelector2'], name: ['nameSelector2'] } }
]);
deleteWebProps
Deletes entries from the _webProps.
Parameters:
site: string | string[]
: The site(s) to be deleted.
Usage:
scraper.deleteWebProps('exampleSite');
scraper.deleteWebProps(['site1', 'site2']);
updateReplaceObj
Updates entries in the _replaceObj
.
Parameters:
key: string | string[]
: The key(s) to be updated.value: string | string[]
: The value(s) to be assigned.
Usage:
scraper.updateReplaceObj('oldString', 'newString');
scraper.updateReplaceObj(['oldString1', 'oldString2'], ['newString1', 'newString2']);
deleteReplaceObj
Deletes entries from the _replaceObj
.
Parameters:
key: string | string[]
: The key(s) to be deleted.
Usage:
scraper.deleteReplaceObj('oldString');
scraper.deleteReplaceObj(['oldString1', 'oldString2']);
Configuration
You can customize the scraper by providing additional configurations.
Insert New Entries
Add new website configurations to the scraper:
import { EshopScraper } from 'eshop-scraper';
const propsList = new Map([
['test.com', {
site: 'Test',
selectors: {
priceSelector: ['span[itemprop="price"]'],
nameSelector: ['h1[itemprop="name"]'],
},
}],
]);
const scraper = new EshopScraper({
webProps: propsList,
});
Replace or Exclude Strings
Modify or exclude certain strings in the scraped data:
import { EshopScraper } from 'eshop-scraper';
const replaceObj = {
'price is:': '',
now: '',
usd: '$',
};
const scraper = new EshopScraper({
replaceObj: replaceObj,
});
Insert New Currencies
Map additional currencies for accurate conversion:
import { EshopScraper } from 'eshop-scraper';
const currencyList = new Map([
[['$'], 'USD'],
[['euro', '€'], 'EUR'],
]);
const scraper = new EshopScraper({
currencyMap: currencyList,
});
Insert New Set of Headers
Provide custom headers to mimic realistic browser requests:
import { EshopScraper } from 'eshop-scraper';
const newHeaders = [
{
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
},
{
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0',
},
];
const scraper = new EshopScraper({
headersArr: newHeaders,
});
Set Timeout
Configure the request timeout:
import { EshopScraper } from 'eshop-scraper';
const scraper = new EshopScraper({
timeout: 10, // Timeout in seconds
});
Set Retry Attempts
Configure the number of retry attempts for failed requests:
import { EshopScraper } from 'eshop-scraper';
const scraper = new EshopScraper({
retry: 3, // Number of retry attempts
});
Check Default Values
Use this script to inspect default values for supported websites, replaced strings, headers, and more:
import { EshopScraper } from 'eshop-scraper';
const scraper = new EshopScraper();
(async () => {
console.log('Supported websites:', scraper._webProps);
console.log('Replaced strings:', scraper._replaceObj);
console.log('Headers:', scraper._headers);
console.log('Currency map:', scraper._currencyMap);
console.log('Timeout amount:', scraper._timeoutAmount);
console.log('Retry attempts:', scraper._retry);
process.exit(0);
})();
Supported Websites
The eshop-scraper
package supports 8 websites by default. Additional websites can be added through configuration.
Default Supported Websites
- Steam (store.steampowered.com)
- Amazon (amazon.com, amazon.in)
- Crutchfield (crutchfield.com)
- Playstation (store.playstation.com, gear.playstation.com)
- Ebay (ebay.com)
- Bikroy (bikroy.com)
Note
Limitations
Static vs. Dynamic Websites: This scraper is designed for static websites. It does not support dynamic or Single Page Applications (SPAs) at this time. Future versions may include support for dynamic content.
Price Format Issues: Some websites might display prices in an unexpected format. For instance, prices may initially appear without a decimal point or use a comma instead of a dot. The scraper cannot execute JavaScript, so it cannot dynamically convert these formats. As a result, prices may be shown incorrectly (e.g., "2345" instead of "23.45").
Language and Currency: The scraper processes prices in English. If a website displays prices in a local language or script, the scraper might not interpret them correctly. Ensure that the price format is in English for accurate results.
Contribute
We welcome contributions to the eshop-scraper
project! To contribute, please open a pull request on GitHub. Your input helps improve the scraper for everyone.