s4-cli
v2.2.0
Published
Command line interface for the s4-service
Downloads
8
Readme
s4-cli
Table of Contents
Overview
The s4-cli
is a command line tool for the s4 service that allows users to send audio data to the service, obtaining text or source separated audio as a response. Data can be sent to the service from multiple sources (pre recorded/real time), while requesting different processing options (batch/stream), and different responses (test/cleaned up audio).
Data Sources
NOTE: Currently, all pre recorded data sources for ASR must be be encoded as 16-bit, 16kHz .wav or PCM. This applies to ASR only, in both batch and stream mode.
Pre Recorded Files
The tool can be used to send one or more pre recorded files (in .wav format) to the service, either as individual files, or as a group of files in a directory. When sending a collection of files from a directory, regular expression matching patterns can be used to filter the files that are chosen for processing.
Real Time Audio
The tool also supports the capture of real time audio data from the default microphone on the device. It relies on Sox to access the microphone, and currently assumes a default microphone configuration (8 channel, 16KHz sample rate, 32 bit signed integer).
Processing Options
Data can be processed by the service in one of two modes batch
, or stream
. The key difference between the two processing modes is that batch
mode processing combines all of the input data into a single file before processing, while stream
mode processing processes audio chunks as they are received by the server. Note that batch
mode is currently not supported when using real time audio.
Response Types
The service returns one of two possible responses text
or audio
. When text
mode is chosen, the service performs ASR on the cleaned up audio (and raw input), returning the resultant text from the separated audio. When audio
mode is chosen, the service does not attempt any ASR, but instead returns the cleaned up audio stream as a response.
The following table summarizes the different options available when using the CLI tool:
| | Stream Mode | Batch Mode |
|:----------------------:|:-----------:|:---------------:|
| Pre Recorded Audio | text/audio | text/audio* |
| Real Time Audio | text/audio | Not Supported
|
*Batch mode does not provide cleaned up audio as a direct response. However, cleaned up streams are stored in the cloud, and can be downloaded using the request id
Installation
Prerequisites
The following is a list of prerequisites required to run the s4-cli
- NodeJS
v0.12.0
- Required to run the tool - Sox
v14.4.1
- Required to capture real time audio - Git - Required to download and install the tool from npm
NodeJS
Install version 0.12.0 of NodeJS.
Windows
An installation package for Node can be downloaded from the NodeJs downloads page. Download and install the appropriate installation package for your operating system.
Mac OSX
An installation package .dmg
file for Mac OSX can be downloaded and installed from the NodeJs downloads page.
Alternate Approach:
NodeJs can be installed via HomeBrew by running the following in a command shell:
brew update
brew install node
Ubuntu
Instructions on installing NodeJS on Linux using a package manager can be found here https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager
Sox
This program is only required if the tool will be used with real time audio capture.
Windows
The simplest way to install this package would be download and run the installation package for your platform from the Sox downloads page.
The downloaded package is a .zip file that contains the sox executable, and related files. These files can be extracted to any convenient location on file system. Once extracted, ensure that the sox folder has been added to the PATH
variable.
This can be done by updating the path variable within a terminal shell as follows:
PATH=%PATH%;c:\sox-14.4.1\;
This change only applies to the command shell that it is executed in. If a global setting is preferred, update your path variable under Environment Variables
. This panel can be found here: My Computer
--> Properties
--> Advanced
--> Environment Variables
.
Mac OSX
An installation package .dmg
file for Mac OSX can be downloaded and installed from the Sox downloads page.
Alternate Approach:
Sox can be installed via HomeBrew by running:
brew update
brew install sox
Ubuntu
Sox can be installed on linux using a package manager. The following is an example that uses apt-get
to install Sox on Ubuntu.
sudo apt-get update
sudo apt-get install sox libsox-fmt-all
Git
Git is a version control tool that is used, among other things, to create copies of code from a remote source control repository. In this case, node's package manager tool npm
uses the Git to download and install the CLI on the local computer.
*** Windows ***
The simplest way to install this program would be to download and run the installation package for your platform from the Git downloads page.
Mac OSX
An installation package .dmg
file for Mac OSX can be downloaded and installed from the Git downloads page
Alternate Approach:
Git can be installed via HomeBrew by running:
brew update
brew install git
Ubuntu
Git can be installed on linux using a package manager. The following is an example that uses apt-get
to install Git on Ubuntu.
sudo apt-get update
sudo apt-get install git
Installation
Once all the prerequisites have been installed, s4-cli
can be installed via npm by running the following in a command shell:
NOTE: The
command shell
refers to the terminal program in Mac OSX/Linux, or thecmd.exe
program on Windows operating systems
npm install -g s4-cli
NOTE: The above command may sometimes fail because elevated privileges may be required (this depends on how nodejs/npm has been setup).
If that is the case, the problem can be resolved by prefixing the above command with
sudo
(Linux/Mac OSX), or by running the command in a terminal window running with administrator privileges (Windows)
You can test if the CLI has been installed correctly by typing:
s4-cli --help
The above command should display the available command line options for the s4-cli
tool.
Usage
This section outlines the common use cases for using the s4-cli
tool on the command line. The basic usage of the s4-cli
tool is as follows:
NOTE: The
command shell
refers to the terminal program in Mac OSX/Linux, or thecmd.exe
program on Windows operating systems
s4-cli [ACTION] [OPTIONS]
Where:
[ACTION]
: This argument specifies the type of action to perform on the input to the service. This argument can be one of the following values:
asr-batch
: Requests the service to clean up the data in batch mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.asr-stream
: Requests the service to clean up the data in streaming mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.audio-stream
: Requests the service to clean up the data in streaming mode, and return the cleaned up stream.
[OPTIONS]
: Are other arguments that can be passed to the command line tool. The following is a brief summary of supported options:
--help
: Displays help information that includes usage and options details.--url
or-u
: The base url of the service, including protocol type. If not specified, this parameter defaults to:http://s4front-end.elasticbeanstalk.com/
--api-key
or-a
: This is the API key that uniquely identifies the entity making the request. Please contact your ADI representative if you do not have a key, and would like to obtain one.--mic-config
: This is a microphone configuration parameter that is sent to the service. This parameter will be used by the service when processing the input. If not specified, this parameter defaults todefault microphone config
--algorithm
: This parameter identifies the algorithm used to clean up the input data. If not specified, this parameter defaults to:ntf-v1
--tag
: This is a string parameter that will be used as the folder name under which input/output artifacts are stored in cloud storage. This value is especially useful when multiple files are being processed simultaneously, and it is desirable to tag the files so that they may be reviewed as a group--input-file
: When specified, this parameter identifies a single input file that will be sent to the service for processing.--input-dir
: When specified, this parameter identifies a directory whose entire file contents will be sent to the service for processing. Files within the directory may be filtered using the the--pattern
option--audio-device
: When sepcified, this parameter indicates that real time audio will be captured from the default audio device, and sent to the service for processing--pattern
: This parameter can be used in conjunction with the--input-dir
option to specify a regular expression filter that will be applied on the names of the files within the input directory. Only files that match the regular expression pattern will be selected for further processing.--output-dir
: Specifies the directory in which output artifacts generated by the CLI will be stored. If not specified, this parameter defaults to./out
. Note that this directory must exist on the file system if raw audio is requested from the server.--output-summary
: An optional file name that will contain a report of execution. The report will be stored inreport.json
if this parameter is omitted. The file will be created in the output directory, as specified by the--output-dir
parameter.
Some things to remember:
- At least one action parameter has to be specified (
asr-batch
,asr-stream
oraudio-stream
).
- At least one input source has to be specified (
--input-file
,--input-dir
or--audio-device
) - If the action specified requires the server to return an audio stream, the output directory specified by
--output-dir
must exist on the file system - If a tag value (
--tag
) is specified, files will be stored under a directory with the same name as the tag value.
Considerations for Real Time Audio
Configuring the Default Microphone
When recording real time audio, the CLI attempts to capture data directly from the default microphone on the computer. It is important to ensure that the default microphone has been set correctly before using the CLI.
For example, this can be done on Mac OSX by running the following in the terminal:
set AUDIODEV=hw:1
Note that this is not a global setting, and only applies to all s4-cli
execution within that terminal session.
Waiting for Recording to Start
On some computers, there could be a slight delay between when the s4-cli
starts execution, and when actual recording commences. It is recommended that the user pause until the following message is displayed:
Audio is being captured from the default audio device. Press <ESC> to stop
Recording can be stopped by pressing the ESC
key.
Execution Behavior
This section provides an overview of the execution behavior of the service. While the documentation provided here is geared towards using the CLI, the behavior of the service remains the same irrespective of how it is accessed.
- When a request is received by the service, it generates a unique id for the request, called the
requestId
. This id is globally unique, and is used to tag all input to and output generated by the service. - All input sent to the service will be stored in the cloud (AWS S3). The following are the rules used when storing data:
- All files in S3 are partitioned by API key. This means that each API key has a separate S3 partition allocated to it.
- Each request is assgined a separate folder that will in turn hold three artifacts for every input file (1) The unprocessed input file, (2) The cleaned up audio data, (3) The noisy audio data
- The folder will have the same name as the
requestId
, unless a tag value (--tag
) is specified. If the tag value is specified, it will be used to name the S3 folder - Note that reusing the same tag value for multiple requests will result in previous results being overwritten by the latest request
- The service will process the request data, and send responses back to the client (in this case the CLI)
- The CLI shows responses from the service on the terminal, and also optionally saves summary information in a
.json
file. For audio processing request where the response is cleaned up audio, the service response will be saved in the output directory with the same name as therequestId
.
Summary Report Format:
The summary report for requests is stored in a .json
file. The following is an example of the summary report format:
[
{
"inputType":"file",
"input":"data/SoundTest.wav",
"message":"OK",
"success":true,
"output":{
"0":{ "status":"success", "text": "this" },
"1":{ "status":"success", "text": "this" },
"2":{ "status":"success", "text": "this" },
"3":{ "status":"success", "text": "this is" },
"4":{ "status":"success", "text": "this is" },
"5":{ "status":"success", "text": "this is" },
"6":{ "status":"success", "text": "this is a" },
"7":{ "status":"success", "text": "this is a" },
"8":{ "status":"success", "text": "this is a" },
"9":{ "status":"success", "text": "this is a test" }
"metadata":{
"requestId":"e31cd313-9eef-4d36-9938-a3936c19c9de",
"s3Folder":"e31cd313-9eef-4d36-9938-a3936c19c9de",
"micConfig":"az7",
"algorithm":"ntf-v1"
}
}
},
{
...
}
]
The file contains a JavaScript array with one object for every input received as a part of the request. Each object in turn contains information about the type of request, and also the response from the server, including metadata such as the id of the request, folder name in S3, etc.
Examples
Perform clean up and ASR in batch mode on a single pre recorded file:
s4-cli asr-batch --api-key=_apikey_ --mic-config= _micconfig_ --input-file=./data/sound-recording-1.wav
Perform clean up and ASR in stream mode on all .wav files in a given directory:
s4-cli asr-stream --api-key=_apikey_ --mic-config= _micconfig_ --input-dir=./data --pattern='.*.wav$'
Perform clean up in stream mode on real time data captured from the default audio device:
s4-cli audio-stream --api-key=_apikey_ --mic-config= _micconfig_ --audio-device