npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, πŸ‘‹, I’m Ryan HefnerΒ  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you πŸ™

Β© 2024 – Pkg Stats / Ryan Hefner

@rxtk/linear16

v0.0.0

Published

🎢 Converts audio data to LINEAR16 format

Downloads

8

Readme

@rxtk/linear16

🎢 Converts stream of audio chunks to LINEAR16 format (single-channel 16-bit PCM at sample rate of 16KHz)

yarn add @rxtk/linear16

API

toLinear16

import {fromFile} from '@rxtk/fs';
import {toLinear16} from '@rxtk/linear16';

const inputFile = './my-audio-file.mulaw';
const audioChunk$ = fromFile({filePath: inputFile});
const linear16$ = audioChunk$.pipe(toLinear16({
  mimeType: 'audio/x-mulaw',
  sampleRate: 8000,
  channels: 1,
  firstChunkContainsHeaders: false,
}));
linear16$.subscribe(); // outputs a stream of buffers, encoded as LINEAR16

Audio data formats

| encoding | bit-depth | rate (KHz)| channels | lossless | headers | compressed | comment | supported | | :--------- | :-------: | :--: | :--: | :--: | :--: | :--: | :----- | :-----: | | l16 | 16 | 16❓ | ❓ 1 | βœ… | ❌ | ❌ | Standard for STT | βœ… | | flac | ❓ | ❓ | ❓ | βœ… | βœ… | βœ… | Compressed PCM | | | 32-bit PCM | 32 | ❓ | ❓ | βœ… | ❌ | ❌ | Raw PCM (32-bit floats) | | | basic | 8 | 8 | 1 | ❌ | ❌ | βœ… | Telephone calls (USA) | βœ… | | mulaw | 8 | ❓ 8 | 1 | ❌ | ❌ | βœ… | Telephone calls (USA) | βœ… | | mpeg/mp3 | 16 | 44.1 | 2❓ | ❌ | βœ… | βœ… | Music and video | | | wav | ❓ | ❓ | ❓ | ❓ | βœ… | ❓ | Universal container | | | webm (opus)| ❓ | 8-48❓ | ❓1-255 | ❌ | | ❓ | Browser/web standard | | | webm (vorbis)| ❓ | | | | | | Older browser/web standard | |

❓ indicates it is variable. ❓ with a number means that it is usually set to that value but not always.

For machine learning models (including speech-to-text), the standard is generally single-channel LINEAR16 at 16KHz. This is what we use because it is the most portable and all speech to text pipelines support it.

These are the most common audio data formats but there are dozens of possible formats.

Brief explanation of how audio data works

  • Raw audio data generally consists of samples of audio over time. The raw data can be represented simply as an array of numbers.
  • The sample rate describes how often audio samples are taken. For example, 16KHz means there are 16,000 samples taken per second. So to sample one second of audio, you would need 16,000 numbers (samples) to represent it.
  • Each sample is represented by a number describing the height of the sound wave at any given point in time. Usually this number is a 16-bit integer or 32-bit float. This is the bit-depth of the audio data. For example, 16-bit encoded PCM data is represented by a series of 16-bit integers and has a bit depth of 16 bits.
  • Audio can have one or more channels: most commonly mono (1 channel) or stereo (2 channels).
  • Audio data can be fairly large so it is often compressed. Some compression formats (like MP3 and Mulaw) are lossy and others like (FLAC) are lossless (they preserve all of the original data).
  • Some audio formats (wav, mp3, flac) contain headers and metadata at the start of the file. Others (LINEAR16, PCM, Mulaw) are simply raw audio data with no headers.
  • Some multi-channel audio formats break data into frames. Each frame represents a window of time and contains the audio samples for all of the channels but only for that time frame.

Stream processing of audio

Some unique considerations when processing audio in a streaming system:

  • Headers generally need to be read first and only for the first chunk or chunks in the stream. The easiest way do deal with this is to ensure that all of the header metadata is contained in the first chunk being analyzed.
  • In order to de-compress a compressed format (like FLAC, MP3, or Mulaw), it may be necessary to break the audio stream into complete units that can be de-compressed--incomplete frames may need to be buffered until can be read in their entirety.

Audio data references

If you want to learn more, these web pages are helpful: