@interzoid/data-matching
v1.2.7
Published
Interzoid SDK for Typescript, Generative-AI powered data matching, data quality, and data normalization for organization and individual name data
Downloads
137
Readme
Interzoid Data Matching Node.js SDK
Version: 1.2.7
This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data quality, data cleansing, and data normalization for organization and individual name data. Functions include the generation of similarity keys (also called match keys) for identifying and matching inconsistent name data, as well as comparing and scoring data for matching purposes.
The concept is that the same similarity key will be algorithmically generated for different permutations of the same data content, such as GE, Gen Elec, General Electric all generating the same similarity key. Then, these similarity keys can be used as the basis of matching data, identifying duplicates, and resolving inconsistencies that can otherwise degrade the usefulness and value of data-driven applications, processes, or anything else that makes use of data. These similarity keys form the basis of many of the different functions available in the SDK that make use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases - all in the Cloud - to provide its results. These include functions that generate similarity keys for custom use, functions that score matches for certain use cases, and functions that process and perform matching functions with entire database tables and datasets.
Table of Contents
- API Key
- Installation
- Data Matching APIs
- Interzoid Cloud Data Connect
- Interzoid Account Information (Remaining Credits)
API Key
Please visit https://www.interzoid.com/register-api-account to register for an API key and receive free usage credits. This API key will be used as a parameter with each call to the API (via the SDK function) for authentication and usage tracking.
Installation
The Interzoid SDK requires Node.js v14 or greater.
npm install @interzoid/data-matching
Data Matching APIs
Interzoid uses algorithmically generated similarity keys leveraging Generative AI, Large Language Models (LLMs), Machine Learning, specialized algorithms, and extensive knowledge bases to intelligently match data within or across data sources. Match rates can increase significantly when similarity keys are used with important data.
To learn more about the technology behind these APIs and to better understand how to make use of similarity keys, please visit https://docs.interzoid.com/entries/understanding-data-matching
Match Key Functions
Full Name Match Key
This API provides a hashed similarity key from the input data used to match with other similar full name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort individual name data by similarity as similar individual names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset, and can also help matching across datasets or for more advanced searching.
Example
import { getFullNameMatchKey } from '@interzoid/data-matching';
async function fullNameMatch() {
const result = await getFullNameMatchKey({ apiKey: 'your-interzoid-api-key', fullName: 'John Smith' });
console.log(result);
}
Result
{
"simKey": "N1Ai4RfV0SRJf2dJwDO0Cvzh4xCgQG",
"code": "Success",
"credits": "9999"
}
Additional documentation
https://interzoid.com/apis/individual-name-matching
Company Name Match Key
This API provides a hashed similarity key from the input data used to match with other similar company name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort company name data to identify inconsistently represented company and organization name data, as similar organization and company names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset or across multiple data sources.
The optional algorithm
parameter provides multiple matching algorithms:
| Algorithm | Description |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| wide
| Very fast algorithm with a wider umbrella of similarity. Better when you have other data that will also serve as part of match criteria. |
| narrow
| Very fast algorithm, more stringent than wide. Better if you are matching on company or organization name only. |
| ai-medium-wide
| (Recommended) High matching accuracy with a wider similarity range, especially for use with the availability of additional matching criteria. Recommended for most use cases, but may have longer response times. |
| ai-medium-narrow
| Best matching accuracy when matching solely on company or organization names without any other match criteria. Provides the most precise matching, but may have longer response times. |
| ai-plus-wide
| Balances speed and precision with a wider similarity umbrella. Ideal for matching with additional data criteria. |
| ai-plus-narrow
| Balances speed and precision with a more stringent matching criteria. Better for matching solely on company or organization names. |
The default algorithm is ai-medium-wide
.
Example
import { getCompanyNameMatchKey } from '@interzoid/data-matching';
async function companyNameMatch() {
const result = await getCompanyNameMatchKey({
apiKey: 'your-interzoid-api-key',
company: 'Microsoft',
algorithm: 'ai-medium-wide'
});
console.log(result);
}
Result
{
"simKey": "cZdRqd6Ya6FBDPmFpn4_USiTu2DVoYO32ANw1Z5NYN0",
"code": "Success",
"credits": "9999"
}
Additional documentation
https://interzoid.com/apis/company-name-matching
Address Match Key
This API provides a hashed similarity key from the input data used to match with other similar address data. Use the generated similarity key, rather than the actual data itself, to match and/or sort address data by similarity, as similar addresses will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and address element variations when matching either withing a single dataset, or across datasets. It also provides for broader searching capabilities.
You can choose from following matching algorithms:
| Algorithm | Description |
|--------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| narrow
| Matches considering unit designators (suite, apt, etc.). Very fast but less accurate than AI options. |
| wide
| Matches on street address only, ignoring unit designators. Very fast but less accurate than AI options. |
| ai-medium-narrow
| (Recommended) Best accuracy, considering unit designators. Recommended algorithm, but API response can take a little bit longer. |
| ai-medium-wide
| Best accuracy, matching on street address only. API can take a little longer but provides high-quality results. |
| ai-plus-narrow
| AI-enhanced matching considering unit designators. Second-best in performance and accuracy. |
| ai-plus-wide
| AI-enhanced matching on street address only. Second-best in performance and accuracy. |
Example
import { getAddressMatchKey } from '@interzoid/data-matching';
async function addressMatch() {
const result = await getAddressMatchKey({
apiKey: 'your-interzoid-api-key',
address: '500 main street',
algorithm: 'ai-medium-narrow'
});
console.log(result);
}
Result
{
"simKey": "T8O0ROaEgJIFhqcgg7SyCBjRryRVa43oMO2sMlq9r0s",
"code": "Success",
"credits": "9999"
}
Additional documentation
https://interzoid.com/apis/street-address-matching
Match Score Functions
We provide two operations for match scoring: Organization name and Full name. The request parameters for these operations are identical--provide two values and the API returns a matching score on a scale of 0-100 indicating how similar the two values are and how close they are to potentially being a match. This score is determined through a series of logic that includes the use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases. Best practices include setting a threshold, for example 50, 60, or 70 as indicating a potential match and then dealing with the potential matches as desired.
Full Name Match Score
This API provides a match score (likelihood of matching) between two individual names on a scale of 0-100, where 100 is the highest possible match.
import { getFullNameMatchScore } from '@interzoid/data-matching';
async function fullNameMatchScore() {
const result = await getFullNameMatchScore({
apiKey: 'your-interzoid-api-key',
value1: 'John Smith',
value2: 'John Smyth'
});
console.log(result);
}
Result
{
"score": "80",
"code": "Success",
"credits": "9999"
}
Organization Name Match Score
This API provides a match score (likelihood of matching) ranging from 0 to 100 between two organization names.
import { getOrganizationMatchScore } from '@interzoid/data-matching';
async function organizationNameMatchScore() {
const result = await getOrganizationNameMatchScore({
apiKey: 'your-interzoid-api-key',
value1: 'Apple',
value2: 'Apple Inc.'
});
console.log(result);
}
Result
{
"score": "95",
"code": "Success",
"credits": "9998"
}
Cloud Data Connect
Introduction
Interzoid's Cloud Data Connect is a set of functions that allow you to match data in your cloud database or delimited text file such as CSV and TSV with Interzoid's data matching algorithms.
Matching Process
The process
parameter determines the type of matching process to run. The package provides an enum
called Process
that contains the available options.
| Process | Description |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| Process.MATCH_REPORT
| Generate a report of all found clusters of similar data that share the same generated similarity key. |
| Process.CREATE_TABLE
| Creates a new table in the source database with all the similarity keys for each record in the source table, so they can be used for additional queries. |
| Process.GEN_SQL
| Generate the SQL INSERT statements to store the similarity keys in a database for ability to review before execution. |
| Process.KEYS_ONLY
| Output a generated similarity key for every record in the dataset. |
Source
The source
parameter determines the type of data source containing the data you are performing matching functions
with. The package provides an enum
called Source
that contains the available options.
Some commonly used examples are:
| Source | Description |
|---------------------|--------------------------------------|
| Source.MYSQL
| Match data in a MySQL database. |
| Source.POSTGRES
| Match data in a PostgreSQL database. |
| Source.MARIADB
| Match data in a MariaDB database. |
| Source.DATABRICKS
| Match data in a Databricks table. |
| Source.CSV
| Match data in a CSV file. |
Please see the source code for a complete list of available options.
Category
The category
parameter determines the type of data you're matching. The package provides an enum
called Category
that contains the available options.
| Category | Description |
|-----------------------|-------------------------|
| Category.COMPANY
| Match company names. |
| Category.INDIVIDUAL
| Match individual names. |
| Category.ADDRESS
| Match addresses. |
Connection Strings
The connection
parameter is a connection string for your database. The format of the connection string depends on the
database you're connecting to.
Please see this page for examples of connection strings for various databases.
Match and write results to a new table
Set the process
parameter to CREATE_TABLE
to create a new table in your database with the match keys. The newTable
parameter is the name of the new table to create. This table will be created by the process, and will contain the
original data and the similarity key.
Do not create the table manually; the process will handle the creation.
You'll have to grant the user you're connecting with the ability to create a new table in the database in addition to the ability to read from the table you're matching.
import { getCloudDatabaseMatchKeyReport, Process, Category, Source } from '@interzoid/data-matching';
async function databaseMatchKeyReport() {
const result = await getCloudDatabaseMatchKeyReport({
apiKey: 'your-interzoid-api-key',
process: Process.CREATE_TABLE,
category: Category.COMPANY,
source: Source.MYSQL,
connection: 'db_user:db_password@tcp(db_host)/database',
table: 'companies', // table to match
column: 'companyname', // column to match
reference: 'id', // optional reference column
newTable: 'companies_match_keys' // new table to create
});
console.log(result);
}
Response
"Creating new table...Table companies_match_keys created successfully."
Match Key Report for a cloud database table
Response options
- Set
json
totrue
to return a JSON object with arrays of match clusters. - Set
html
totrue
to return results in plain text with clusters separated by html<br>
tags. - Don't set either to return results in plain text with clusters separated by newlines.
import { getCloudDatabaseMatchKeyReport, Source, Process, Category } from '@interzoid/data-matching';
async function databaseMatchKeyReport() {
const result = await getCloudDatabaseMatchKeyReport({
apiKey: 'your-interzoid-api-key',
process: Process.MATCH_REPORT,
category: Category.COMPANY,
source: Source.MYSQL,
connection: 'db_user:db_password@tcp(db_host)/database',
table: 'companies',
column: 'companyname',
reference: 'id',
json: true,
});
console.log(JSON.stringify(result, null, 2));
}
Sample Response
{
"Status": "success",
"Message": "",
"MatchClusters": [
[
{
"Data": "Cisco",
"Reference": "",
"SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
},
{
"Data": "Cisco Systems",
"Reference": "30",
"SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
}
],
[
{
"Data": "Netflix",
"Reference": "15",
"SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
},
{
"Data": "\"Netflix, Inc.\"",
"Reference": "34",
"SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
}
]
]
}
Text File Match Key Report
Provide a URL to a delimited file (CSV or TSV) and the API will return a match key report for the data in the file.
import { getDelimitedFileMatchKeyReport, Process, Source, Category } from '@interzoid/data-matching';
async function csvFileMatchReport() {
const result = await getDelimitedFileMatchKeyReport({
apiKey: 'your-interzoid-api-key',
process: Process.MATCH_REPORT,
category: Category.COMPANY,
source: Source.CSV,
table: Source.CSV,
connection: 'https://dl.interzoid.com/csv/companies.csv',
column: '1', // column number to match
json: true,
});
console.log(JSON.stringify(result, null, 2));
}
Result
{
"Status": "success",
"Message": "",
"MatchClusters": [
[
{
"Data": "Good Year Tire & Rubber",
"Reference": "",
"SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
},
{
"Data": "Goodyear Tire Inc",
"Reference": "Transportaions",
"SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
}
],
[
{
"Data": "Pederson Tooling Inc.",
"Reference": "Transportaions",
"SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
},
{
"Data": "Peterson Tools",
"Reference": "Services",
"SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
}
]
]
}
Account Information
This API retrieves the current amount of remaining purchased (or trial) credits for a license key.
Using this function does not deduct credits from your account.
import { getRemainingCredits } from '@interzoid/data-matching';
async function remainingCredits() {
const result = getRemainingCredits({ apiKey: 'your-interzoid-api-key' });
console.log(result);
}
Result
{
"credits": "9998",
"code": "Success"
}