@bbc/stt-align-node
v1.4.3
Published
<!-- _One liner + link to confluence page_ _Screenshot of UI - optional_ -->
Downloads
1,191
Maintainers
Keywords
Readme
Stt-align-node
See The alignment problem in the docs for more background of the problem this module set out to address.
Originally developed as a node version of python's stt-align by Chris Baume - BBC R&D.
Setup - development
git clone [email protected]:bbc/stt-align-node.git
cd stt-align-node
npm install
Setup - in production
npm install @bbc/stt-align-node
Usage
Other then to realign STT results with accurate text, this modules can also be used to perform related oprations in the same domain, such as benchmarking STT.
|Function| Description | type|
|:------|------|----|
|alignSTT
|Realign STT json with accurate text. by transposing words from accurate text to timecodes of STT. | json
|
|diffsList
|return a diff json of STT vs accurate text | json
|
|diffsListAsHtml
|return a diff of STT vs accurate text as HTML| html
|
|diffsCount
|return a diff of STT vs accurate text as HTML| json
|
|calculateWordDuration
|return a diff of STT vs accurate text as HTML| Number
|
See See README
in example-usage
folder as well as code examples for more.
System Architecture
Node version of stt-align by Chris Baume - R&D.
In pseudo code overview of alignSTT
:
input, output as described in the example usage.
- Accurate base text transcription, string.
- Array of word objects transcription from STT service.
Align words
normalize words, by removing capitalization and punctuation and converting numbers to letters
generate array list of words from base text, and array list of words from stt transcript.
- get opcodes using
difflib
comparing two arrays - for equal matches, add matched STT word objects segment to results array base text index position.
- Then iterate to result array to replace STT word objects text with words from base text
- get opcodes using
interpolate missing words
- calculates missing timecodes
- first optimization
- using neighboring words to do a first pass at setting missing start and end time when present
- Then Missing word timings are interpolated using interpolation library
'everpolate
.
Development env
- node
10
- npm
6.1.0
Build
npm run build
bundles the code with react, into a ./build
folder.
build demo
npm run build:demo
Demo is in docs folder
Publish demo to github pages
npm run deploy:ghpages
Tests
npm run test:watch
- [ ] add more tests
Deployment
Deploy to npm
npm run publish:public