npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wikipedia-airport-scraper

v0.4.0

Published

Get airport codes and flight connections from Wikipedia airport pages

Downloads

270

Readme

Wikipedia Airport Scraper

A small Node.js script to scrape info about airports and their destinations from Wikipedia pages. When provided with the full HTML of any airport page on the mobile version of the English language Wikipedia, it will extract:

  • IATA and ICAO codes for this airport
  • Coordinates
  • A list of all flights listed with info on the destination airport, airline and any start and end dates. Also includes flags to indicate if a destination has been suspended and whether it's seasonal and/or operated as a charter flight. Basically anything from the 'Airlines and destinations' table as a consistent and formatted output.

It is left to any script that uses this to:

  • Set up requests to the en.m.wikipedia.org pages, grab responses and rate limit those requests where necessary
  • Store and process any output from the scraper
  • Make further requests to lookup destination airports or link airline names to IATA/ICAO codes.

Right now, this script doesn't provide any way to look up basic data found on airline pages and as such can't help you to link names to codes. Sunch functionality might be added in the future.

How to use

Here's a very simple example that gets data for Brussels Airport from the wikipedia url:

import got from 'got' // Or any other package that requests a HTML page
import write from 'write' // Or any other package that writes data to a local file

import scrape from 'wikipedia-airport-scraper'

// Get the HTML from the page and pass it to the script
const data = await got('https://en.m.wikipedia.org/wiki/Brussels_Airport').then((response) => scrape(response.body))

// Write out the scraped data
const outputPath = new URL('./data.json', import.meta.url).pathname
await write(outputPath, JSON.stringify(data, null, 2))

The data (simplified to show only one airline and one destination) then looks like this:

{
  "name": "Brussels Airport",
  "iataCode": "BRU",
  "icaoCode": "EBBR",
    "coordinates": {
    "latitude": 50.901389,
    "longitude": 4.484444
  },
  "flights": [
    {
      "airline": {
        "name": "Aegean Airlines",
        "link": "Aegean_Airlines"
      },
      "destination": {
        "shortName": "Athens",
        "fullName": "Athens International Airport",
        "link": "Athens_International_Airport",
        "isCharter": false,
        "isSeasonal": false,
        "suspended": false,
        "startDate": null,
        "endDate": null
      }
    }
  ]
}

Caveats

  • It's obvious but probably deserves to be said: the output of this script can only be as good as the Wikipedia page that it uses as input. YMMV.

  • Two different but related airlines might be mapped to the same link (and ultimately IATA code) by Wikipedia. Here's part of the output from Kansai International Airport that shows All Nippon Airways and ANA Wings with a different name but the same link, in this case serving the same route. For now, the script will not recognise this as a duplicate.

{
  "flights": [
    {
      "airline": {
        "name": "All Nippon Airways",
        "link": "All_Nippon_Airways"
      },
      "destination": {
        "shortName": "Naha",
        "fullName": "Naha Airport",
        "link": "Naha_Airport",
        "isCharter": false,
        "isSeasonal": false,
        "suspended": false,
        "startDate": null,
        "endDate": null
      }
    },
    {
      "airline": {
        "name": "ANA Wings",
        "link": "All_Nippon_Airways"
      },
      "destination": {
        "shortName": "Naha",
        "fullName": "Naha Airport",
        "link": "Naha_Airport",
        "isCharter": false,
        "isSeasonal": false,
        "suspended": false,
        "startDate": null,
        "endDate": null
      }
    }
  ]
}
  • Not every destination airport that the script picks up on airport pages will have an actual link. If the link leads to a Wikipedia edit page, it will appear in the JSON as null. Here's part of the output from Ignatyevo Airport that shows Zeya as a destination airport with no Wikipedia page to link to:
{
  "flights": [
    {
      "airline": {
        "name": "Angara Airlines",
        "link": "Angara_Airlines"
      },
      "destination": {
        "shortName": "Zeya",
        "fullName": null,
        "link": null,
        "isCharter": false,
        "isSeasonal": true,
        "suspended": false,
        "startDate": null,
        "endDate": null
      }
    }
  ]
}