@vatis-tech/asr-client-js
v2.0.9
Published
JavaScript client for Vatis Tech ASR services.
Downloads
241
Maintainers
Readme
@vatis-tech/asr-client-js
Client JavaScript implementation for Vatis Tech's live ASR service.
Contents
- Installation 📀
- Constructor 🦺
- Props 📦
- Methods 🖇
- Browser Support 🔮
- Contributing 🏗
- Getting Help ☎️
- Changelog 💾
- Further Reading 📚
Installation
Via NPM
Install the latest version
npm i @vatis-tech/asr-client-js
This will install the latest version of @vatis-tech/asr-client-js
with the caret (^
) symbol to its version, inside the package.json
file.
This means, that when you will do a later install into your project, it will take the latest minor version.
You can read more about this here: npm caret and tilde.
Install the exact latest version.
npm i -E @vatis-tech/asr-client-js
This will install the latest version of @vatis-tech/asr-client-js
without the caret (^
).
This means that on each new install, you will still have the initial installed version.
You can read more about this here: npm install --save-exact.
Via CDN
You can also use this plugin via CDN, and use it inside an HTML & JavaScript project, that will run in browsers. Just copy and paste the following script into your project:
<script src="unpkg.com/@vatis-tech/[email protected]/umd/vatis-tech-asr-client.umd.js" charset="utf-8"></script>
Via Download
You can also choose to download it, and use it locally, instead of a CDN. You can download it by pressing the following link: download here. Or, download it from Github here. After that copy and paste the following script into your app:
<script src="%path%/asr-client-js/dist/umd/vatis-tech-asr-client.umd.js" charset="utf-8"></script>
And replace %path%
with the path where you've downloaded and unzipped the plugin.
Constructor
Via NPM
First you need to import the plugin:
import VTC from "@vatis-tech/asr-client-js";
After that, you can initialize it like so:
const vtc = new VTC({
service: "LIVE_ASR",
language: "ro_RO",
apiKey: "YOUR_API_KEY",
onData: (data) => { console.log(data); },
log: true,
});
Via CDN or Download
If you opted out to use it as a downloadable or CDN (i.e. via a script
tag inside a static HTML & JavaScript project), you will be able to use the constructor as follows:
const vtc = new VatisTechClient.default({
service: "LIVE_ASR",
language: "ro_RO",
apiKey: "YOUR_API_KEY",
onData: (data) => { console.log(data); },
log: true,
});
Props
config
This is an Object with the following structure:
{
"spokenCommandsList": [
{
"command": "COMMAND_NAME",
"regex": [ "regex1", "regex2", "regex3", ... ]
},
...
],
"findReplaceList": [
{
"replacement": "REPLACEMENT",
"regex": [ "regex1", "regex2", "regex3", ... ]
}
]
}
spokenCommandsList
Where the value of spokenCommandsList
is an array of objects that have two properties, command
and regex
.
The value of the command
, i.e. COMMAND_NAME
, is a String.
The value of the regex
, i.e. [ "regex1", "regex2", "regex3", ... ]
, is an Array of Strings, i.e. regex1
, regex2
, regex3
are Strings.
The ideea with this spokenCommandsList
, is that each time one of the values from the regex
array is matched in the transcript, it will fire the onCommandData callback, with a special header
on the data, named SpokenCommand
.
The value of the SpokenCommand
header will be exactly the value of the command
, i.e. COMMAND_NAME
.
For example, you can use this spokenCommandsList
to define rules of when you want a new paragraph:
{
"spokenCommandsList": [
{
"command": "NEW_LINE",
"regex": ["new line", "new paragraph", "from the start", "start new line"]
}
]
}
So each time the back-end algorithm will find in the transcript one of "new line"
, "new paragraph"
, "from the start"
, "start new line"
phrases, the VTC client will fire the onCommandData callback. This way, in your application, you will be able to know, when to start a new paragraph.
findReplaceList
And the value of findReplaceList
is an array of objects that have two properties, replacement
and regex
.
The value of the replacement
, i.e. REPLACEMENT
, is a String.
The value of the regex
, i.e. [ "regex1", "regex2", "regex3", ... ]
, is an Array of Strings, i.e. regex1
, regex2
, regex3
are Strings.
The ideea with this findReplaceList
, is that each time one of the values from the regex
array is matched in the transcript, it will change it to the replacement
.
For example, you can use this findReplaceList
to define rules for wrong named entities
{
"findReplaceList": [
{
"replacement": "SpongeBob",
"regex": ["Spange Bwab", "SpanBob", "Spwange Bob", "Sponge Boob"]
}
]
}
So each time the back-end algorithm will find in the transcript one of "Spange Bwab"
, "SpanBob"
, "Spwange Bob"
, "Sponge Boob"
phrases, it will change it to "SpongeBob"
.
You can also have replacements as symbols and punctuation marks:
{
"findReplaceList": [
{
"replacement": "(",
"regex": ["open parentheses", "new parentheses"]
},
{
"replacement": ")",
"regex": ["close parentheses", "stop parentheses"]
},
{
"replacement": "[",
"regex": ["open square brackets", "new square brackets"]
},
{
"replacement": "]",
"regex": ["close square brackets", "stop square brackets"]
}
]
}
Notes
When sending a config
to the client, the first callback to be fired, will be the onConfig callback.
service
This is a String that refers to the service that you would like to use.
Vatis Tech offers two speech-to-text services, LIVE_ASR
, you will receive the transcript while recording your microphone.
And STATIC_ASR
, you upload a file, and receive the transcript on a given link (at the moment, this plugin does not support this feature).
Only LIVE_ASR
can be used at the moment.
model
This is a String that represents the ID of the model you want to use.
If not specified, the default model of the selected language will be used.
language
This is a String for the language you want to transcribe from.
It must be in the following format: language_region
.
At the moment, only ro_RO
is available.
apiKey
This is a String of your API key.
To get one, please follow these instructions:
- If you do not have one, please create an account on https://vatis.tech/.
- Log in to your account on https://vatis.tech/login.
- Got to the API key page on your account, https://vatis.tech/account/api-key.
- Copy the API key from there and add it to the
@vatis-tech/asr-client-js
constructor.
connectionConfig
This is an Object with the following structure:
{
"service_host": "service_host",
"use_same_service_host_on_ws_connection": true | false,
"auth_token": "auth_token"
}
Where service_host
is a string, and the value of it is the host where the Vatis Tech Transcription Service is located. And auth_token
is a string, that is the Authentication token for connecting to the Vatis Tech Transcription Service.
The use_same_service_host_on_ws_connection
specifies if the returned live service IP should be ignored when making the connection, and the service_host
should be used instead. It defaults to false
.
NOTE
You will only use one of the connectionConfig
or apiKey
method to connect to the Vatis Tech Transcription Service.
You will use the apiKey
when connecting to the Vatis Tech Cloud API, and you will use the connectionConfig
method when using the Vatis Tech On Premise Installation, and you will be provided with the necessary connectionConfig
object.
onData
This is a Function on which you will receive from the back-end the transcript chunks. It is a callback it is always fired..
It has the following signature:
const onData = (data) => {
/* do something with data */
}
Or with function names:
function onData(data) {
/* do something with data */
}
The data
object that is received has the following structure:
General structure
{
"type": "<str>",
"headers": {
"key1": "value1",
"key2": "value2"
}
}
Timestamped transcription packet
{
"type": "TIMESTAMPED_TRANSCRIPTION",
"headers": {},
"transcript": "hello world",
"words": [
{
"word": "hello",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
]
}
Timestamped transcription packet
{
"type": "PROCESSED_TIMESTAMPED_TRANSCRIPTION",
"headers": {},
"transcript": "Hello, world!",
"words": [
{
"word": "hello",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
],
"processed_words": [
{
"word": "Hello,",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world!",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
]
}
Headers
| Name | Type | Description |
| --------------------- | ------- | ---------------------------------------------------------------------------------------------------------- |
| PacketNumber | int | Incremental packet number |
| Sid | string | Session id |
| FrameStartTime | double | Frame start time in milliseconds |
| FrameEndTime | double | Frame end time in milliseconds |
| FinalFrame | boolean | Flag for marking that a segment of speech has ended and it won't be updated |
| SilenceDetected | boolean | Flag to indicate silence was detected on the audio frame |
| ProcessingTimeSeconds | double | Time of inferencing |
| SplitPacket | boolean | Flag that indicates the response packet was split and this is one of the pieces |
| FinalSplitPacket | boolean | Flag that indicates this is the final piece of the split response |
| SplitId | string | Full packet id in format <packet_number>.<split_id>.<sub-split-id>.<sub-sub-split-id>
|
| RequestBytes | int | Additional bytes requested to produce a frame. This is just an estimation, any number of bytes can be sent |
| SpokenCommand | string | Command detected in frame |
NOTE
So, the data
can be final frame - i.e. the backend has fully finalized the transcript for those words and the time intervals (start and end time).
Or can be partial frame - i.e. the backend has not fully finalized the transcript for those words and the time intervals, and it will most likely change until it is overlapped by a final frame.
onPartialData
This is a Function on which you will receive from the back-end the partial transcript chunks.
It is identical to what the onData callback does, just that the data
will always represent partial frames.
It has the following signature:
const onPartialData = (data) => {
/* do something with data */
}
Or with function names:
function onPartialData(data) {
/* do something with data */
}
NOTE
The data
object that comes on the current onPartialData
callback overrides the data
object that came on the previous onPartialData
callback.
onFinalData
This is a Function on which you will receive from the back-end the final transcript chunks.
It is identical to what the onData callback does, just that the data
will always represent final frames.
It has the following signature:
const onFinalData = (data) => {
/* do something with data */
}
Or with function names:
function onFinalData(data) {
/* do something with data */
}
NOTE
The data
object that comes from the onFinalData
callback overrides the data
object that came on the previous onPartialData
callback.
onConfig
This is a Function on which you will receive from the back-end a message saying if the config was succesfully added ore not.
It has the following signature:
const onConfig = (data) => {
/* do something with data */
}
Where data
object has the following structure:
Config applied packet
{
"type": "CONFIG_APPLIED",
"headers": {},
"config_packet": {
"type": "CONFIG",
"headers": {},
"spokenCommandsList": [
{
"command": "NEW_PARAGRAPH",
"regex": ["new line"]
}
]
}
}
onCommandData
This is a Function on which you will receive from the back-end the transcript chunks for speciffic commands.
For example, if you initialize the plugin with a set of commands (e.g. {spokenCommandsList: [ { "command": "NEW_PARAGRAPH", "regex": ["start new paragraph", "new phrase", "new sentence"] } ] }
), each time the back-end algorithm will find these sets of commands, it will send on this function the data.
It has the following signature:
const onCommandData = (data) => {
/* do something with data */
}
Or with function names:
function onCommandData(data) {
/* do something with data */
}
The data
object from this callback, is the same as the one from onData callback, but it also has a new property, named spokenCommand
, with the actual command that triggered the callback.
log
This is a Boolean prop.
If set to true, it will call the logger
function with an object that has the following structure:
{
currentState: ...,
description: ....
}
This tells you the current state of the plugin.
The last state will be the following:
{
currentState: `@vatis-tech/asr-client-js: Initialized the "MicrophoneGenerator" plugin.`,
description: `@vatis-tech/asr-client-js: The MicrophoneGenerator was successful into getting user's microphone, and will start sending data each 1 second.`,
}
logger
This is a Function on which you will receive data about the plugin state.
It has the following signature:
const logger = (info) => {
/* do something with info */
}
Or with function names:
function onData(info) {
/* do something with info */
}
The info
object that is received has the props from above.
If log
prop is set to true
and the logger
prop is not set, or is not a function with the above signature, the plugin will default the logger
to console.log
.
onDestroyCallback
This is a Function that will be called upon successful destruction;
errorHandler
This is a Function that will be called upon errors;
host
This is the host for generating a key. It defaults to "https://vatis.tech/".
microphoneTimeslice
How fast you want data to be captured from the microphone. Default is 250 milliseconds
.
frameLength
The frame length of what the microphone catches. Default is 0.3 seconds
. (For a microphoneTimeslice
of 250
, the frameLength
is 0.3
).
frameOverlap
Default is 0.3 seconds
.
bufferOffset
Default is 0.3 seconds
.
waitingAfterMessages
This is a number that needs to be > 0. It represents the number of message to be sent to the ASR Service, before waiting for a response. Default is 5
.
EnableOnCommandFinalFrame
This is a boolean, and if set to true
, it means, that each time the transcription sees one command, it will trigger a final frame there.
Methods
destroy
This will destroy the instantiated @vatis-tech/asr-client-js
.
Also, the destroy method will be invoked if any error will come through the socket.io-client
as a response from Vatis Tech ASR SERVICE.
NOTE! If the VTC plugin did not send all messages, or it did not receive all messages, the destruction will not happen instantly.
NOTE! The destruction of the VTC plugin will happen only when all messages have been sent and received.
NOTE! If you wish to destroy the VTC plugin without waiting for all messages to be sent and received, you can pass { hard: true}
as a parameter to the .destroy
call.
pause
Call this method, if you want to pause for a while the recording.
resume
After calling the pause
method, you can call this one to resume recording.
microphoneDeviceId
This is to specify which audioinput
device id, should be used by the client. If undefined
or the browser does not have that audioinput
device id, it will select a default one.
You can read more on the following links:
- https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/enumerateDevices
- https://developer.mozilla.org/en-US/docs/Web/API/MediaDeviceInfo
onDownloadRecording
Call this methos if you want to download the audio file as audio/webm
type.
getRecordingAsBlobChunks
Call this methos if you want to get all chunks from your michrophone as blobs.
You can then use this to download the audio as you wish. Below is an example of downloading as audio/webm
.
// ... code
try {
const allBlobData = vtc.getRecordingAsBlobChunks();
if (allBlobData && allBlobData.length) {
const audioBlob = new Blob(allBlobData, {
type: "audio/webm",
});
const audioUrl = URL.createObjectURL(audioBlob);
const anchor = document.createElement("a");
anchor.style.display = "none";
document.body.appendChild(anchor);
anchor.href = audioUrl;
anchor.download = "audio.webm";
anchor.click();
window.URL.revokeObjectURL(audioUrl);
anchor?.remove();
}
} catch (error) {
console.error(error);
}
// ... code
Browser Support
We officially support the latest versions of the following browsers:
| Chrome | Firefox | Safari | Safari | Edge | | :------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: | | | | | | |
Contributing
We love pull requests!
Our community is safe for all. Before submitting a pull request, please review and agree our Code of Conduct, after that, please check the Contribution guidelines.
Getting Help
If you have questions, you need some help, you've found a bug, or you have an improvement idea, do not hesitate to open an issue here.
There are three types of issues:
Changelog
To keep the README a bit lighter, you can read the Changelog here.
Further Reading
Developers
If you are a developer, the following links might interest you:
- API documentation: https://vatis.tech/documentation/
- API status: https://vatistech.statuspage.io/
- Supported languages: https://vatis.tech/languages
- Accepted file formats: https://vatis.tech/formats
- Check the pricing: https://vatis.tech/pricing
- Join the team: https://vatis.tech/careers
About Vatis Tech
If you are just curios to learn more about Vatis Tech, please refer to these links:
- Landing page for Vatis Tech: https://vatis.tech/
- About Vatis Tech: https://vatis.tech/about
- Vatis Tech newsroom: https://vatis.tech/press
Social Media
- Message us on Facebook: https://www.facebook.com/VatisTech/
- Connect with us on LinkedIn: https://www.linkedin.com/company/vatis-tech/
- Chat with out Facebook community: https://www.facebook.com/groups/1630293847133624