npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

node-bbsub

v0.0.3

Published

Corpus linguistics on BBC programmes' subtitles

Downloads

8

Readme

bbsub

love to nibble anything that comes into the shed, like our willies.

bbsub is a wrapper library to the get_iplayer tool, mainly aimed at downloading the programmes' subtitles and playing with corpus linguistics.

The library comes with a few examples for immediate use.

Where do the subtitles come from?

get_iplayer is an open source BBC iPlayer client. Among its many features - some of which are controversial, such as the capability to download to keep the programmes - get_iplayer can download the programmes' subtitles when available, in srt format.

bbsub wraps get_iplayer's subtitle download functionality and adds some of its own, as described below.

Usage

First search what programmes you are interested in by using get_iplayer, e.g.

$ get_iplayer "have i got news for you"
get_iplayer 2.85.0.0, Copyright (C) 2008-2010 Phil Lewis
  This program comes with ABSOLUTELY NO WARRANTY; for details use --warranty.
  This is free software, and you are welcome to redistribute it under certain
  conditions; use --conditions for details.

Matches:
373:	Have I Got News for You: Series 46 - Episode 6, BBC One, Comedy,Guidance,Satire,TV, default

INFO: 1 Matching Programmes
$ 

... then get the programme's words and word frequency:

$ node example1.js 373
{"programme":2,"contains":2,"strong":3,"language":2,"good":9,"evening":1,"welcome":3,"news":8,"alexander":2,"armstrong":1, ... }

Alternatively, you could directly do:

$ node example1.js "have i got news for you"
{"programme":2,"contains":2,"strong":3,"language":2,"good":9,"evening":1,"welcome":3,"news":8,"alexander":2,"armstrong":1, ... }

If the result of the search is more than one programme, the returned corpus will be merged across all programmes, e.g.:

$ node example1.js "Inspector Montalbano" 

will give you the corpus of all the episodes listed below:

$ get_iplayer "Inspector Montalbano"
get_iplayer 2.85.0.0, Copyright (C) 2008-2010 Phil Lewis
  This program comes with ABSOLUTELY NO WARRANTY; for details use --warranty.
  This is free software, and you are welcome to redistribute it under certain
  conditions; use --conditions for details.

Matches:
446:	Inspector Montalbano: Series 3 - 1. Angelica's Smile, BBC Four, Crime,Drama,Guidance,TV, default
447:	Inspector Montalbano: Series 3 - 2. Hall of Mirrors, BBC Four, Crime,Drama,Guidance,TV, default
448:	Inspector Montalbano: Series 3 - 3. A Voice in the Night, BBC Four, Crime,Drama,Guidance,TV, default
449:	Inspector Montalbano: Series 3 - 4. A Ray of Light, BBC Four, Crime,Drama,Guidance,TV, default

Run example.js without parameters to see which search options of the original get_iplayer are supported.

The library offers also additional functionality, such as the extraction of the time code for each occurrence of the words. See the code for more detail. An example is provided with the source:

$ node example2.js "have i got news for you"
Analysing 'Have I Got News for You: Series 46: Episode 6'
The top ten words are:
  1: said (32 occurrences at 00:01:52,000, 00:02:56,000, 00:05:20,000, ...)
  2: right (21 occurrences at 00:02:38,000, 00:02:46,000, 00:03:03,000, ...)
  3: just (20 occurrences at 00:03:53,000, 00:04:30,000, 00:04:37,000, ...)
  4: know (18 occurrences at 00:02:14,000, 00:02:27,000, 00:02:46,000, ...)
  5: going (17 occurrences at 00:07:11,000, 00:11:24,000, 00:13:05,000, ...)
  6: people (15 occurrences at 00:02:14,000, 00:02:38,000, 00:02:56,000, ...)
  7: think (14 occurrences at 00:02:33,000, 00:04:41,000, 00:06:06,000, ...)
  8: fire (13 occurrences at 00:23:58,000, 00:24:02,000, 00:24:41,000, ...)
  9: godfrey (13 occurrences at 00:01:41,000, 00:01:57,000, 00:02:09,000, ...)
  10: week (13 occurrences at 00:00:45,000, 00:02:09,000, 00:03:08,000, ...)

Notes

Words shorter than three characters and included in a pre-defined set of stopwords are ignored. The list of stopwords is sourced from package 'tm' in R.

Combinations of words that make sense as a whole, e.g. "big data", are unfortunately not recognised. There will be one entry for 'data' and none for 'big', as it is shorter than 4 characters.

The picture is courtesy of Charlie Swinbourne.com's blog.