@datadasher/mlb-scrape
v1.2.0
Published
## Description Using this module, you can get information on a players' transactions (when and when they were injured, in rehab, or changed teams). Note that this module is not dynamic to the page structure, meaning it will break if the HTML structure cha
Downloads
17
Readme
MLB Transactions Scraper
Description
Using this module, you can get information on a players' transactions (when and when they were injured, in rehab, or changed teams). Note that this module is not dynamic to the page structure, meaning it will break if the HTML structure changes on MLB.com.
Usage
The module exports two classes: Scraper and Parser.
Scraper
The Scraper
class provides methods to scrape transaction history for baseball players.
Here’s an example of how to use it:
const { Scraper } = require('@datadasher/mlb-scrape');
const scraper = new Scraper();
scraper.getTransactionsFromPlayerNames(["José Soriano", "Jhonathan Diaz", "Carlos Estévez"]).then(transactions => console.log(JSON.stringify(transactions, null, 2)));
/* Transaction history for the specified players:
[
{
"playerName": "José Soriano",
"transactions": [
{
"date": "June 3, 2023",
"event": "Los Angeles Angels recalled RHP José Soriano from Rocket City Trash Pandas."
},
{
"date": "March 30, 2023",
"event": "RHP José Soriano and assigned to Rocket City Trash Pandas from Salt Lake Bees."
},
},
{
"playerName": "Jhonathan Diaz",
"transactions": [
{
"date": "February 7, 2024",
"event": "LHP Jhonathan Diaz assigned to Tacoma Rainiers."
},
{
"date": "February 7, 2024",
"event": "Seattle Mariners signed free agent LHP Jhonathan Diaz to a minor league contract and invited him to spring training."
},
{
"date": "November 6, 2023",
"event": "LHP Jhonathan Diaz elected free agency."
},
{
"date": "October 16, 2023",
"event": "Los Angeles Angels sent LHP Jhonathan Diaz outright to Salt Lake Bees."
},
{
"date": "October 2, 2023",
"event": "Los Angeles Angels recalled LHP Jhonathan Diaz from Salt Lake Bees."
},
]
},
{
"playerName": "Carlos Estévez",
"transactions": [
{
"date": "July 10, 2023",
"event": "RHP Carlos Estévez assigned to American League All-Stars."
},
{
"date": "February 9, 2023",
"event": "Dominican Republic activated RHP Carlos Estévez."
},
{
"date": "December 5, 2022",
"event": "Los Angeles Angels activated RHP Carlos Estévez."
},
{
"date": "December 5, 2022",
"event": "Los Angeles Angels activated RHP Carlos Estévez."
},
]
}
]
*/
Another usage:
const parseBot = new Parser();
parseBot.getTeamHistory(["Brandon Drury"], 2023)
.then(result => console.log(result));
/*
{
"Brandon Drury": [
{
"start_date": "June 20, 2010",
"end_date": "January 24, 2013",
"team": "Atlanta Braves",
"league": "MLB"
},
{
"start_date": "January 24, 2013",
"end_date": "February 20, 2018",
"team": "Arizona Diamondbacks",
"league": "MLB"
},
{
"start_date": "February 20, 2018",
"end_date": "July 26, 2018",
"team": "New York Yankees",
"league": "MLB"
},
{
"start_date": "July 26, 2018",
"end_date": "October 6, 2020",
"team": "Toronto Blue Jays",
"league": "MLB"
},
{
"start_date": "January 5, 2021",
"end_date": "October 14, 2021",
"team": "New York Mets",
"league": "MLB"
},
{
"start_date": "March 21, 2022",
"end_date": "August 2, 2022",
"team": "Cincinnati Reds",
"league": "MLB"
},
{
"start_date": "August 2, 2022",
"end_date": "November 6, 2022",
"team": "San Diego Padres",
"league": "MLB"
},
{
"start_date": "December 22, 2022",
"end_date": "Knowledge cut-off date",
"team": "Los Angeles Angels",
"league": "MLB"
}
]
}
*/
scraper2.getMlbUrl('Jordyn Adams').then(url => console.log(url));
// https://www.mlb.com/player/jordyn-adams-677941
Parser
The Parser
class provides methods to analyze the injury history of baseball players.
Here’s an example of how to use it:
const { Parser } = require('@datadasher/mlb-scrape');
const parser = new Parser();
const injuryAnalyzer = new Parser();
injuryAnalyzer.analyzeInjuries(["José Soriano", "Jhonathan Diaz", "Carlos Estévez"])
.then(result => console.log(result));
/* Injury and rehab history:
{
"Jose_Soriano": [
{
"start_injury": "April 5, 2022",
"start_rehab": "July 28, 2022",
"end_injury": "August 25, 2022",
"reason": "Unknown"
},
{
"start_injury": "February 17, 2021",
"start_rehab": "May 20, 2021",
"end_injury": "November 6, 2021",
"reason": "Unknown"
},
{
"start_injury": "July 14, 2019",
"start_rehab": "August 6, 2019",
"end_injury": "August 22, 2019",
"reason": "Unknown"
}
],
"Jhonathan_Diaz": [
{
"start_injury": "August 3, 2022",
"start_rehab": "Unknown",
"end_injury": "November 10, 2022",
"reason": "Unknown"
},
{
"start_injury": "June 30, 2021",
"start_rehab": "Unknown",
"end_injury": "July 16, 2021",
"reason": "Unknown"
}
],
"Carlos_Estevez": [
{
"start_injury": "September 27, 2022",
"start_rehab": "Unknown",
"end_injury": "October 6, 2022",
"reason": "Unknown"
},
{
"start_injury": "May 3, 2021",
"start_rehab": "May 20, 2021",
"end_injury": "May 22, 2021",
"reason": "Right middle finger strain"
},
{
"start_injury": "March 29, 2018",
"start_rehab": "April 5, 2018",
"end_injury": "June 25, 2018",
"reason": "Left oblique strain, then right elbow strain"
},
{
"start_injury": "May 19, 2014",
"start_rehab": "Unknown",
"end_injury": "June 23, 2014",
"reason": "Unknown"
}
]
}
*/
How to Install
- If you already have npm private, use
npm install @datadasher/mlb-scrape
- Install Node
- In your terminal, create package.json using
npm init -y
- Install Puppeteer, a web scraper, using
npm install puppeteer
- If you are missing any libraries, try
sudo apt-get install libnss3 libxss1 libasound2 libatk-bridge2.0-0 libgtk-3-0 libgbm-dev
- You also need to set up Wikidata and OpenAI (see below). We call the Wikidata API to get the player ID from a player name. This player ID is then used to get the player's unique URL on MLB.com for scraping information on their transactions. Then we use an LLM to parse out the information from their transactions.
- Create a .env file. Make sure you have done
npm install dotenv
In it, fill in the following
CLIENT_APP_KEY=''
CLIENT_APP_SECRET=''
ACCESS_TOKEN=''
OPENAI_API_KEY=''
- Start the application using
node index
Setting Up Wikidata
https://www.wikidata.org/wiki/Wikidata:REST_API/Authentication
Setting Up OpenAI
https://platform.openai.com/docs/quickstart?context=node
Setting up OAuth 2.0 for Wikidata
- To make authenticated requests against the Wikibase REST API for Wikidata, you must first set up an OAuth 2.0 client (formerly known as "consumer").
- Log into meta.wikimedia.org using your unified login.
- Create an OAuth 2.0 client.
- (Get there by clicking on Special pages, then OAuth consumer registration, then Request a token for a new consumer.)
- Supply the following information to the form:
- Application name: Name it something informative. Example: "Wikibase REST API for Wikidata"
- Application description: Again, use some informative text that explains how you intend to use the API. Example: "Wikibase REST API access for maintaining my dataset about animal cookies"
- This consumer is for use only by (your name) (checkbox): Check this box under normal circumstances. See below for situations when you would leave this box unchecked.
- Applicable grants (checkboxes): Check each box that describes a kind of access you need for your task.
- By submitting this application... (checkbox): Read the user agreement and, if you agree to the terms, check the box.
- Submit the form by clicking the "Propose consumer" button.
- Save the three tokens provided on the next screen:
- Client application key: used to obtain bearer tokens
- Client application secret: used to obtain bearer tokens
- Access token: provides access to the API when included in the API request (length: ~1800 characters)
Tips: HTML Structure
If the div structure happens to change on MLB.com, you can easily amend the getTransactionsFromUrl
funcction:
Open console, use Elements > Copy > Copy Selector to get the specific HTML on transactions from the page, and adjust the code to match that structure.
Tips: Transaction Prompting
We want to determine which days a player was injured. Examples would include:
- “placed on the 10-day injured list”: This indicates that the player has been injured and is expected to be out for at least 10 days. The injury could be longer depending on the player’s recovery.
- “placed on the 60-day injured list”: This indicates a more serious injury that will keep the player out for at least 60 days.
- “sent on a rehab assignment”: This indicates that the player is recovering from an injury and is starting to play in minor league games as part of their rehabilitation.
- “activated from the injured list”: This indicates that the player has recovered from their injury and is ready to return to the major league roster.
- “transferred to the 60-day injured list”: This usually indicates that the player’s injury is more serious than initially thought, or that the player has suffered a setback in their recovery.