tjd-dol-5500-sample-utterances
v0.0.2
Published
Creates Amazon Echo Sample Utterances for Querying Department of Labor 5500 Reports
Downloads
4
Readme
sampleutterances
Nodejs program that creates a list of slot values for a custom slot type in the Amazon Echo service
Installation
npm install sampleutterances
Usage
This program will connect to a mongodb instance, query a database, and return a document of plans that match.
It accepts the following command line parameters:
--output
Provides an output file name that will contain the slot values. Any existing file with the same name will be deleted.
Default
./Sponsor Name Slot Values.txt
Example
node app.js --output="myoutputfile.txt"
--dburl
Provides the full path to the MongoDb instance and database name.
Default
mongodb://10.0.0.27/LawyerServices
Example
node app.js --dburl=mongodb:://mywebaddress/mymongodbname
--maxentries
Sets the maximum number of entries this program will produce. The maximum of 50,000 is set by Amazon so there is no reason to specify this argument unless you're just testing or Amazon updates their max and I fail to effect a contemporaneous update.
Default
50000
Example
node app.js --maxentries=100
--maxcharacters
Sets the maximum number of characters for all the entries combined. The maximum of 600,000 is set by Amazon so there is no reason to specify this argument unless you're just testing or Amazon updates their max and I fail to effect a contemporaneous update.
Default
600000
Example
node app.js --maxcharacters=32768
--minparticipants
Each document in the database describes an employer benefit plan as drawn from the U.S. Department of Labor. Each such document contains a property that discloses the number of participants in the plan. There are more plan sponsors than can be accommodated by Amazon's custom slot feature, so we have to limit the plans we pull. I limit it based on the number of participants on theory that if I extract the plans with the highest number of participants, I will have extracted the plans that users are most likely to request.
You have to tune this number carefully so that you squeeze out all that you can from the database into Amazon's custom slot. After the program runs, it will give you some idea about whether it thinks you should change this value and rerun the program.
Default
275
Example
node app.js --minparticipants=250
minParticipants = getArgument("--minparticipants", minParticipants) * 1;
Example Invocation
The following command line
node app.js --minparticipants=300
Will produce output like this:
Connecting to MondoDb
Connected OK
Querying for plan names having more than 300 participants
Found 30426 plans. Normalizing and converting names . . .
Eliminated 4494 duplicates in this phase.
Sorting 25932 plan names to remove additional duplicates
Writing results.
Eliminated 131 duplicate entries.
Created 25801 sample utterances containing 598468 characters.
Notes
I can't imagine that application has any general use. It is specifically designed to take a list of employee benefit plans and create a file that can be copied and pasted into Amazon's Echo service as part custom slot type relating to a custom skill. In particular, it has been developed for my use in implementing a custom "Alexa" skill using Amzaon's Echo Voice Services.
To Do
The program works. A further refinement would be to eliminate near-duplicates. For example, currently it would pass through the following two entries:
ABX AIR
ABX AIR PROFIT SHARING
Those are two different plans, for sure, but those are the same sponsors and a user is likely to ask for "ABX AIR" anyway and get both results. Squeezing these and names like these would increase the number of slot values we can provide Amazon and thereby improve the recognition rate for some more obscure plans.
Copyright
Copyright (c) 2016 by Thomas J. Daley <[email protected]>