@activediscourse/podcast-parser

v1.0.0

Published

3 years ago

Parse XML podcast feeds into objects

Downloads

0High
0Medium
0Low

haltcase

citycide

podcast node browser rss feed parse parser xml

podcast-parser

Parse XML podcast RSS feeds into standardized objects.

installation

yarn add @activediscourse/podcast-parser

usage

Pass a string containing XML source:

const parsePodcast = require("@activediscourse/podcast-parser")

parsePodcast("<podcast xml>")
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

This library only handles parsing, so you'll need to fetch the feed separately first. For example, using node-fetch (or fetch in the browser):

const fetch = require("node-fetch")
const parsePodcast = require("@activediscourse/podcast-parser")

;(async () => {
  const response = await fetch("https://pinecast.com/feed/activediscourse")
  const xml = await response.text()
  const feed = await parsePodcast(xml)

  return feed
})()
  .then(feed => console.log(feed))
  .catch(e => console.error(e))

output format

The output is opinionated with the goal of normalizing results across feeds:

{
  "title": "<Podcast title>",
  "description": {
    "short": "<Podcast subtitle>",
    "long": "<Podcast description>"
  },
  "link": "<Podcast link (usually website for podcast)>",
  "image": "<Podcast image>",
  "language": "<ISO 639 language>",
  "copyright": "<Podcast copyright>",
  "updated": "<pubDate or latest episode pubDate>",
  "explicit": "<Podcast is explicit, true/false>",
  "categories": [
    "Category>Subcategory"
  ],
  "author": "<Author name>",
  "owner": {
    "name":  "<Owner name>",
    "email": "<Owner email>"
  },
  "episodes": [
    {
      "guid": "<Unique id>",
      "title": "<Episode title>",
      "subtitle": "<Episode subtitle>",
      "description": "<Episode description>",
      "rawDescription": "<Episode description stripped of HTML tags>",
      "explicit": "<Episode is is explicit, true/false>",
      "image": "<Episode image>",
      "published": "<date>",
      "duration": 120,
      "categories": [
        "Category"
      ],
      "enclosure": {
        "filesize": 5650889,
        "type": "audio/mpeg",
        "url": "<mp3 file>"
      }
    }
  ]
}

notes

language

Many podcasts have the language set something like en. A best effort attempt is made to normalize language strings to an IETF language code, so for example en will be converted to en-us. Non-English languages will be presented for example as de-DE.

normalization

Not all feeds can be guaranteed to contain all properties, so they are simply ommited from the output in that case.

Episode categories are included as an empty array if the podcast isn't assigned any categories.

Episodes are sorted in descending order by publish date.

development

Clone the repo: git clone https://github.com/activediscourse/podcast-parser.git
Move into the new directory: cd podcast-parser
Install dependencies: yarn
Build the source: yarn build
Run tests: yarn test

license

See license

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme