npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

gtfs-utils

v5.1.0

Published

Utilities to process GTFS data sets.

Downloads

333

Readme

gtfs-utils

Utilities to process GTFS data sets.

npm version ISC-licensed minimum Node.js version support me via GitHub Sponsors chat with me on Twitter

  • ✅ supports frequencies.txt
  • ✅ works in the browser
  • ✅ fully asynchronous/streaming

Design goals

streaming/iterative on sorted data

As public transportation systems will hopefully become more integrated over time, GTFS datasets will often be multiple GBs large. GTFS processing should work in memory-constrained Raspberry Pis or FaaS environments as well.

Whenever possible, all gtfs-utils tools will only read as little data into memory as possible. For this, the individual files in a GTFS dataset need to be sorted in a way that allows iterative processing.

Read more in the performance section.

data-source-agnostic

gtfs-utils does not make assumptions about where you read the GTFS data from. Although it has a built-in tool to read CSV from files on disk, anything is possible: .zip archives, HTTP requests, in-memory buffers, dat/IPFS, etc.

There are too many half-done, slightly opinionated GTFS processing tools out there, so gtfs-utils tries to be as universal as possible.

correct

Aside from new features of the ever-expanded GTFS spec that change the expected behavior of old ones (and bugs of course), gtfs-utils tries to follow the spec closely.

For example, it will, when computing the absolute timestamp/instant of an arrival at a stop, always take into account stop_timezone or the user-supplied timezone, because stop_times.txt uses "wall clock time".

Installing

npm install gtfs-utils

Usage

API documentation

sorted GTFS files

gtfs-utils assumes that the files in your GTFS dataset are sorted in a particular way; This allows it to compute some data aggregations more memory-efficiently, which means that you can use it to process very large datasets. For example, if trips.txt and stop_times.txt are both sorted by trip_id, computeStopovers() can read each file incrementally, only rows for one trip_id at a time.

Miller and sponge work very well for this:

mlr --csv sort -f agency_id agency.txt | sponge agency.txt
mlr --csv sort -f parent_station -nr location_type stops.txt | sponge stops.txt
mlr --csv sort -f route_id routes.txt | sponge routes.txt
mlr --csv sort -f trip_id trips.txt | sponge trips.txt
mlr --csv sort -f trip_id -n stop_sequence stop_times.txt | sponge stop_times.txt
mlr --csv sort -f service_id calendar.txt | sponge calendar.txt
mlr --csv sort -f service_id,date calendar_dates.txt | sponge calendar_dates.txt
mlr --csv sort -f trip_id,start_time frequencies.txt | sponge frequencies.txt

There's also a sort.sh script included in the npm package, which executes the commands above.

Note: For read-only sources (like HTTP requests), sorting the files is not an option. You can solve this by spawning mlr and piping data through it.

Note: With a bit of extra code, you can also use gtfs-utils with a .zip archive or with a remote feed.

basic example

Given our sample GTFS dataset, we'll answer the following question: On a specific day, which vehicles of which lines stop at a specific station?

We define a function readFile that reads our GTFS data into a readable stream/async iterable. In this case we'll read CSV files from disk using the built-in readCsv helper:

const readCsv = require('gtfs-utils/read-csv')

const readFile = (file) => {
	return readCsv(require.resolve('sample-gtfs-feed/gtfs/' + file + '.txt'))
}

computerStopovers() will read calendar.txt, calendar_dates.txt, trips.txt, stop_times.txt & frequencies.txt and return all stopovers of all trips across the full time frame of the dataset.

It returns an async generator function (which thus is async-iterable), so we can use for await.

In the following example, we're going to print all stopovers at airport on the 5th of May 2019:

const {DateTime} = require('luxon')
const computeStopovers = require('gtfs-utils/compute-stopovers')

const day = '2019-05-15'
const isOnDay = (t) => {
	const iso = DateTime.fromMillis(t * 1000, {zone: 'Europe/Berlin'}).toISO()
	return String(t).slice(0, day.length) === day
}

const stopovers = await computeStopovers(readFile, 'Europe/Berlin')
for await (const stopover of stopovers) {
	if (stopover.stop_id !== 'airport') continue
	if (!isOnDay(stopover.arrival)) continue
	console.log(stopover)
}
{
	stop_id: 'airport',
	trip_id: 'a-downtown-all-day',
	service_id: 'all-day',
	route_id: 'A',
	start_of_trip: 1557871200,
	arrival: 1557926580,
	departure: 1557926640,
}
{
	stop_id: 'airport',
	trip_id: 'a-outbound-all-day',
	service_id: 'all-day',
	route_id: 'A',
	start_of_trip: 1557871200,
	arrival: 1557933900,
	departure: 1557933960,
}
// …
{
	stop_id: 'airport',
	trip_id: 'c-downtown-all-day',
	service_id: 'all-day',
	route_id: 'C',
	start_of_trip: 1557871200,
	arrival: 1557926820,
	departure: 1557926880,
}

For more examples, check the API documentation.

Performance

By default, gtfs-utils verifies that the input files are sorted correctly. You can disable this to improve performance slightly by running with the CHECK_GTFS_SORTING=false environment variable.

gtfs-utils should be fast enough for small to medium-sized GTFS datasets. It won't be as fast as other GTFS tools because it

On my M1 Macbook Air, with the 180mb 2022-02-03 HVV GTFS dataset (17k stops.txt rows, 91k trips.txt rows, 2m stop_times.txt rows, ~500m stopovers), computeStopovers computes 18k stopovers per second, and finishes in several hours.

Note: If you want a faster way to query and transform GTFS datasets, I suggest you to use gtfs-via-postgres to leverage PostgreSQL's query optimizer. Once you have imported the data, it is usually orders of magnitude faster.

Related

Contributing

If you have a question or have difficulties using gtfs-utils, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.