@interzoid/data-matching

v1.2.7

Published

a month ago

Interzoid SDK for Typescript, Generative-AI powered data matching, data quality, and data normalization for organization and individual name data

Downloads

137

0High
0Medium
0Low

dvause

izbob

Generative AI Data Quality Data Matching Data Cleansing Data Normalization

Interzoid Data Matching Node.js SDK

Version: 1.2.7

This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data quality, data cleansing, and data normalization for organization and individual name data. Functions include the generation of similarity keys (also called match keys) for identifying and matching inconsistent name data, as well as comparing and scoring data for matching purposes.

The concept is that the same similarity key will be algorithmically generated for different permutations of the same data content, such as GE, Gen Elec, General Electric all generating the same similarity key. Then, these similarity keys can be used as the basis of matching data, identifying duplicates, and resolving inconsistencies that can otherwise degrade the usefulness and value of data-driven applications, processes, or anything else that makes use of data. These similarity keys form the basis of many of the different functions available in the SDK that make use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases - all in the Cloud - to provide its results. These include functions that generate similarity keys for custom use, functions that score matches for certain use cases, and functions that process and perform matching functions with entire database tables and datasets.

API Key

Please visit https://www.interzoid.com/register-api-account to register for an API key and receive free usage credits. This API key will be used as a parameter with each call to the API (via the SDK function) for authentication and usage tracking.

Installation

The Interzoid SDK requires Node.js v14 or greater.

npm install @interzoid/data-matching

Data Matching APIs

Interzoid uses algorithmically generated similarity keys leveraging Generative AI, Large Language Models (LLMs), Machine Learning, specialized algorithms, and extensive knowledge bases to intelligently match data within or across data sources. Match rates can increase significantly when similarity keys are used with important data.

To learn more about the technology behind these APIs and to better understand how to make use of similarity keys, please visit https://docs.interzoid.com/entries/understanding-data-matching

Match Key Functions

Full Name Match Key

This API provides a hashed similarity key from the input data used to match with other similar full name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort individual name data by similarity as similar individual names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset, and can also help matching across datasets or for more advanced searching.

Example

import { getFullNameMatchKey } from '@interzoid/data-matching';

async function fullNameMatch() {
  const result = await getFullNameMatchKey({ apiKey: 'your-interzoid-api-key', fullName: 'John Smith' });
  console.log(result);
}

Result

{
  "simKey": "N1Ai4RfV0SRJf2dJwDO0Cvzh4xCgQG",
  "code": "Success",
  "credits": "9999"
}

Additional documentation

https://interzoid.com/apis/individual-name-matching

Company Name Match Key

This API provides a hashed similarity key from the input data used to match with other similar company name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort company name data to identify inconsistently represented company and organization name data, as similar organization and company names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset or across multiple data sources.

The optional algorithm parameter provides multiple matching algorithms:

| Algorithm | Description | |--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | wide | Very fast algorithm with a wider umbrella of similarity. Better when you have other data that will also serve as part of match criteria. | | narrow | Very fast algorithm, more stringent than wide. Better if you are matching on company or organization name only. | | ai-medium-wide | (Recommended) High matching accuracy with a wider similarity range, especially for use with the availability of additional matching criteria. Recommended for most use cases, but may have longer response times. | | ai-medium-narrow | Best matching accuracy when matching solely on company or organization names without any other match criteria. Provides the most precise matching, but may have longer response times. | | ai-plus-wide | Balances speed and precision with a wider similarity umbrella. Ideal for matching with additional data criteria. | | ai-plus-narrow | Balances speed and precision with a more stringent matching criteria. Better for matching solely on company or organization names. |

The default algorithm is ai-medium-wide.

Example

import { getCompanyNameMatchKey } from '@interzoid/data-matching';

async function companyNameMatch() {
  const result = await getCompanyNameMatchKey({
    apiKey: 'your-interzoid-api-key',
    company: 'Microsoft',
    algorithm: 'ai-medium-wide'
  });
  console.log(result);
}

Result

{
  "simKey": "cZdRqd6Ya6FBDPmFpn4_USiTu2DVoYO32ANw1Z5NYN0",
  "code": "Success",
  "credits": "9999"
}

Additional documentation

https://interzoid.com/apis/company-name-matching

Address Match Key

This API provides a hashed similarity key from the input data used to match with other similar address data. Use the generated similarity key, rather than the actual data itself, to match and/or sort address data by similarity, as similar addresses will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and address element variations when matching either withing a single dataset, or across datasets. It also provides for broader searching capabilities.

You can choose from following matching algorithms:

| Algorithm | Description | |--------------------|--------------------------------------------------------------------------------------------------------------------------------------| | narrow | Matches considering unit designators (suite, apt, etc.). Very fast but less accurate than AI options. | | wide | Matches on street address only, ignoring unit designators. Very fast but less accurate than AI options. | | ai-medium-narrow | (Recommended) Best accuracy, considering unit designators. Recommended algorithm, but API response can take a little bit longer. | | ai-medium-wide | Best accuracy, matching on street address only. API can take a little longer but provides high-quality results. | | ai-plus-narrow | AI-enhanced matching considering unit designators. Second-best in performance and accuracy. | | ai-plus-wide | AI-enhanced matching on street address only. Second-best in performance and accuracy. |

Example

import { getAddressMatchKey } from '@interzoid/data-matching';

async function addressMatch() {
  const result = await getAddressMatchKey({
    apiKey: 'your-interzoid-api-key',
    address: '500 main street',
    algorithm: 'ai-medium-narrow'
  });
  console.log(result);
}

Result

{
  "simKey": "T8O0ROaEgJIFhqcgg7SyCBjRryRVa43oMO2sMlq9r0s",
  "code": "Success",
  "credits": "9999"
}

Additional documentation

https://interzoid.com/apis/street-address-matching

Match Score Functions

We provide two operations for match scoring: Organization name and Full name. The request parameters for these operations are identical--provide two values and the API returns a matching score on a scale of 0-100 indicating how similar the two values are and how close they are to potentially being a match. This score is determined through a series of logic that includes the use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases. Best practices include setting a threshold, for example 50, 60, or 70 as indicating a potential match and then dealing with the potential matches as desired.

Full Name Match Score

This API provides a match score (likelihood of matching) between two individual names on a scale of 0-100, where 100 is the highest possible match.

import { getFullNameMatchScore } from '@interzoid/data-matching';

async function fullNameMatchScore() {
  const result = await getFullNameMatchScore({
    apiKey: 'your-interzoid-api-key',
    value1: 'John Smith',
    value2: 'John Smyth'
  });
  console.log(result);
}

Result

{
  "score": "80",
  "code": "Success",
  "credits": "9999"
}

Organization Name Match Score

This API provides a match score (likelihood of matching) ranging from 0 to 100 between two organization names.

import { getOrganizationMatchScore } from '@interzoid/data-matching';

async function organizationNameMatchScore() {
  const result = await getOrganizationNameMatchScore({
    apiKey: 'your-interzoid-api-key',
    value1: 'Apple',
    value2: 'Apple Inc.'
  });
  console.log(result);
}

Result

{
  "score": "95",
  "code": "Success",
  "credits": "9998"
}

Cloud Data Connect

Introduction

Interzoid's Cloud Data Connect is a set of functions that allow you to match data in your cloud database or delimited text file such as CSV and TSV with Interzoid's data matching algorithms.

Matching Process

The process parameter determines the type of matching process to run. The package provides an enum called Process that contains the available options.

| Process | Description | |------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| | Process.MATCH_REPORT | Generate a report of all found clusters of similar data that share the same generated similarity key. | | Process.CREATE_TABLE | Creates a new table in the source database with all the similarity keys for each record in the source table, so they can be used for additional queries. | | Process.GEN_SQL | Generate the SQL INSERT statements to store the similarity keys in a database for ability to review before execution. | | Process.KEYS_ONLY | Output a generated similarity key for every record in the dataset. |

Source

The source parameter determines the type of data source containing the data you are performing matching functions with. The package provides an enum called Source that contains the available options. Some commonly used examples are:

| Source | Description | |---------------------|--------------------------------------| | Source.MYSQL | Match data in a MySQL database. | | Source.POSTGRES | Match data in a PostgreSQL database. | | Source.MARIADB | Match data in a MariaDB database. | | Source.DATABRICKS | Match data in a Databricks table. | | Source.CSV | Match data in a CSV file. |

Please see the source code for a complete list of available options.

Connection Strings

The connection parameter is a connection string for your database. The format of the connection string depends on the database you're connecting to.

Please see this page for examples of connection strings for various databases.

Match and write results to a new table

Set the process parameter to CREATE_TABLE to create a new table in your database with the match keys. The newTable parameter is the name of the new table to create. This table will be created by the process, and will contain the original data and the similarity key.

Do not create the table manually; the process will handle the creation.

You'll have to grant the user you're connecting with the ability to create a new table in the database in addition to the ability to read from the table you're matching.

import { getCloudDatabaseMatchKeyReport, Process, Category, Source } from '@interzoid/data-matching';

async function databaseMatchKeyReport() {
  const result = await getCloudDatabaseMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.CREATE_TABLE,
    category: Category.COMPANY,
    source: Source.MYSQL,
    connection: 'db_user:db_password@tcp(db_host)/database',
    table: 'companies',                 // table to match
    column: 'companyname',              // column to match
    reference: 'id',                    // optional reference column
    newTable: 'companies_match_keys'    // new table to create
  });
  console.log(result);
}

Response

"Creating new table...Table companies_match_keys created successfully."

Match Key Report for a cloud database table

Response options

Set json to true to return a JSON object with arrays of match clusters.
Set html to true to return results in plain text with clusters separated by html <br> tags.
Don't set either to return results in plain text with clusters separated by newlines.

import { getCloudDatabaseMatchKeyReport, Source, Process, Category } from '@interzoid/data-matching';

async function databaseMatchKeyReport() {
  const result = await getCloudDatabaseMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.MATCH_REPORT,
    category: Category.COMPANY,
    source: Source.MYSQL,
    connection: 'db_user:db_password@tcp(db_host)/database',
    table: 'companies',
    column: 'companyname',
    reference: 'id',
    json: true,
  });
  console.log(JSON.stringify(result, null, 2));
}

Sample Response

{
  "Status": "success",
  "Message": "",
  "MatchClusters": [
    [
      {
        "Data": "Cisco",
        "Reference": "",
        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
      },
      {
        "Data": "Cisco Systems",
        "Reference": "30",
        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
      }
    ],
    [
      {
        "Data": "Netflix",
        "Reference": "15",
        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
      },
      {
        "Data": "\"Netflix, Inc.\"",
        "Reference": "34",
        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
      }
    ]
  ]
}

Text File Match Key Report

Provide a URL to a delimited file (CSV or TSV) and the API will return a match key report for the data in the file.

import { getDelimitedFileMatchKeyReport, Process, Source, Category } from '@interzoid/data-matching';

async function csvFileMatchReport() {
  const result = await getDelimitedFileMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.MATCH_REPORT,
    category: Category.COMPANY,
    source: Source.CSV,
    table: Source.CSV,
    connection: 'https://dl.interzoid.com/csv/companies.csv',
    column: '1',          // column number to match
    json: true,
  });
  console.log(JSON.stringify(result, null, 2));
}

Result

{
  "Status": "success",
  "Message": "",
  "MatchClusters": [
    [
      {
        "Data": "Good Year Tire & Rubber",
        "Reference": "",
        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
      },
      {
        "Data": "Goodyear Tire Inc",
        "Reference": "Transportaions",
        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
      }
    ],
    [
      {
        "Data": "Pederson Tooling Inc.",
        "Reference": "Transportaions",
        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
      },
      {
        "Data": "Peterson Tools",
        "Reference": "Services",
        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
      }
    ]
  ]
}

Account Information

This API retrieves the current amount of remaining purchased (or trial) credits for a license key.

Using this function does not deduct credits from your account.

import { getRemainingCredits } from '@interzoid/data-matching';

async function remainingCredits() {
  const result = getRemainingCredits({ apiKey: 'your-interzoid-api-key' });
  console.log(result);
}

Result

{
  "credits": "9998",
  "code": "Success"
}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Interzoid Data Matching Node.js SDK

Table of Contents

API Key

Installation

Data Matching APIs

Match Key Functions

Full Name Match Key

Example

Result

Additional documentation

Company Name Match Key

Example

Result

Additional documentation

Address Match Key

Example

Result

Additional documentation

Match Score Functions

Full Name Match Score

Result

Organization Name Match Score

Result

Cloud Data Connect

Introduction

Matching Process

Source

Category

Connection Strings

Match and write results to a new table

Response

Match Key Report for a cloud database table

Response options

Sample Response

Text File Match Key Report

Result

Account Information

Result