imdb-watchlist-scraper
v2.0.4
Published
Scrape IMDb ratings list
Downloads
1,160
Readme
IMDb Watchlist Scraper
Overview
This library provides a WatchlistScraper
class for scraping IMDb watchlist data, including rating IDs and usernames, using the IMDb user ID.
Class Initialization
Parameters
userId
(string, required):
IMDb user ID string, formatted like'ur125655832'
.headless
(boolean, optional):
Default:true
.
Determines whether the browser runs in headless mode. Set tofalse
for testing purposes.timeoutInMs
(number, optional):
Default: 3 minutes (180000 ms).
The script timeout duration. If a timeout occurs, thewatchlistGrabIds()
function will console.error() and returnnull
.userAgent
(string, optional):
Default:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
.
Refer to Playwright documentation for user agent formatting details. This is used in Playwright's browser.newContext()
Usage
Main Function: watchlistGrabIds()
Description
watchlistGrabIds()
opens a playwright browser, navigates to the watchlist and scrapes all of the IDs in that watchlist.
Return Value
On success, the function returns an object:
{
idArr: string[], // Array of rating IDs
username: string | null // IMDb username, or null if unavailable
}
Error cases
The function throws an error explicitly in the following cases:
- The target watchlist is privated
- Parsing time exceeds the time set in
timeoutInMs
- Parsing completes but the returned idArr is empty
Example
Scraping All Rating IDs and Username by userId
const scraper = new WatchlistScraper({ userId: "ur125655832" });
try {
const scrapingResults = await scraper.watchlistGrabIds();
// username can be null
const username = scrapingResults.username;
if (!!username) {
console.log(username);
}
// this is never null and always has at least 1 element
const idArr = scrapingResults.idArr;
console.log(idArr);
} catch (error) {
// handle error
}