npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@interzoid/data-matching

v1.2.7

Published

Interzoid SDK for Typescript, Generative-AI powered data matching, data quality, and data normalization for organization and individual name data

Downloads

137

Readme

Interzoid Data Matching Node.js SDK

Version: 1.2.7

This is a Node.js SDK for Interzoid's Generative-AI powered data matching, data quality, data cleansing, and data normalization for organization and individual name data. Functions include the generation of similarity keys (also called match keys) for identifying and matching inconsistent name data, as well as comparing and scoring data for matching purposes.

The concept is that the same similarity key will be algorithmically generated for different permutations of the same data content, such as GE, Gen Elec, General Electric all generating the same similarity key. Then, these similarity keys can be used as the basis of matching data, identifying duplicates, and resolving inconsistencies that can otherwise degrade the usefulness and value of data-driven applications, processes, or anything else that makes use of data. These similarity keys form the basis of many of the different functions available in the SDK that make use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases - all in the Cloud - to provide its results. These include functions that generate similarity keys for custom use, functions that score matches for certain use cases, and functions that process and perform matching functions with entire database tables and datasets.

Table of Contents

  1. API Key
  2. Installation
  3. Data Matching APIs
    1. Match Key Functions
      1. Full Name Match Key
      2. Company Name Match Key
      3. Address Match Key
    2. Match Score Functions
      1. Full Name Match Score
      2. Organization Name Match Score
  4. Interzoid Cloud Data Connect
    1. Introduction
    2. Matching Process
    3. Sources
    4. Processing Categories
    5. Connection Strings
    6. Match and write keys to a new cloud database table
    7. Match Key Report for a cloud database table
    8. Text File Match Key Report
  5. Interzoid Account Information (Remaining Credits)

API Key

Please visit https://www.interzoid.com/register-api-account to register for an API key and receive free usage credits. This API key will be used as a parameter with each call to the API (via the SDK function) for authentication and usage tracking.


Installation

The Interzoid SDK requires Node.js v14 or greater.

npm install @interzoid/data-matching

Data Matching APIs

Interzoid uses algorithmically generated similarity keys leveraging Generative AI, Large Language Models (LLMs), Machine Learning, specialized algorithms, and extensive knowledge bases to intelligently match data within or across data sources. Match rates can increase significantly when similarity keys are used with important data.

To learn more about the technology behind these APIs and to better understand how to make use of similarity keys, please visit https://docs.interzoid.com/entries/understanding-data-matching

Match Key Functions

Full Name Match Key

This API provides a hashed similarity key from the input data used to match with other similar full name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort individual name data by similarity as similar individual names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset, and can also help matching across datasets or for more advanced searching.

Example
import { getFullNameMatchKey } from '@interzoid/data-matching';

async function fullNameMatch() {
  const result = await getFullNameMatchKey({ apiKey: 'your-interzoid-api-key', fullName: 'John Smith' });
  console.log(result);
}
Result
{
  "simKey": "N1Ai4RfV0SRJf2dJwDO0Cvzh4xCgQG",
  "code": "Success",
  "credits": "9999"
}
Additional documentation

https://interzoid.com/apis/individual-name-matching


Company Name Match Key

This API provides a hashed similarity key from the input data used to match with other similar company name data. Use the generated similarity key, rather than the actual data itself, to match and/or sort company name data to identify inconsistently represented company and organization name data, as similar organization and company names will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and name variations when matching within a single dataset or across multiple data sources.

The optional algorithm parameter provides multiple matching algorithms:

| Algorithm | Description | |--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | wide | Very fast algorithm with a wider umbrella of similarity. Better when you have other data that will also serve as part of match criteria. | | narrow | Very fast algorithm, more stringent than wide. Better if you are matching on company or organization name only. | | ai-medium-wide | (Recommended) High matching accuracy with a wider similarity range, especially for use with the availability of additional matching criteria. Recommended for most use cases, but may have longer response times. | | ai-medium-narrow | Best matching accuracy when matching solely on company or organization names without any other match criteria. Provides the most precise matching, but may have longer response times. | | ai-plus-wide | Balances speed and precision with a wider similarity umbrella. Ideal for matching with additional data criteria. | | ai-plus-narrow | Balances speed and precision with a more stringent matching criteria. Better for matching solely on company or organization names. |

The default algorithm is ai-medium-wide.

Example
import { getCompanyNameMatchKey } from '@interzoid/data-matching';

async function companyNameMatch() {
  const result = await getCompanyNameMatchKey({
    apiKey: 'your-interzoid-api-key',
    company: 'Microsoft',
    algorithm: 'ai-medium-wide'
  });
  console.log(result);
}
Result
{
  "simKey": "cZdRqd6Ya6FBDPmFpn4_USiTu2DVoYO32ANw1Z5NYN0",
  "code": "Success",
  "credits": "9999"
}
Additional documentation

https://interzoid.com/apis/company-name-matching


Address Match Key

This API provides a hashed similarity key from the input data used to match with other similar address data. Use the generated similarity key, rather than the actual data itself, to match and/or sort address data by similarity, as similar addresses will generate the same similarity key. This avoids the problems of data inconsistency, misspellings, and address element variations when matching either withing a single dataset, or across datasets. It also provides for broader searching capabilities.

You can choose from following matching algorithms:

| Algorithm | Description | |--------------------|--------------------------------------------------------------------------------------------------------------------------------------| | narrow | Matches considering unit designators (suite, apt, etc.). Very fast but less accurate than AI options. | | wide | Matches on street address only, ignoring unit designators. Very fast but less accurate than AI options. | | ai-medium-narrow | (Recommended) Best accuracy, considering unit designators. Recommended algorithm, but API response can take a little bit longer. | | ai-medium-wide | Best accuracy, matching on street address only. API can take a little longer but provides high-quality results. | | ai-plus-narrow | AI-enhanced matching considering unit designators. Second-best in performance and accuracy. | | ai-plus-wide | AI-enhanced matching on street address only. Second-best in performance and accuracy. |

Example
import { getAddressMatchKey } from '@interzoid/data-matching';

async function addressMatch() {
  const result = await getAddressMatchKey({
    apiKey: 'your-interzoid-api-key',
    address: '500 main street',
    algorithm: 'ai-medium-narrow'
  });
  console.log(result);
}
Result
{
  "simKey": "T8O0ROaEgJIFhqcgg7SyCBjRryRVa43oMO2sMlq9r0s",
  "code": "Success",
  "credits": "9999"
}
Additional documentation

https://interzoid.com/apis/street-address-matching


Match Score Functions

We provide two operations for match scoring: Organization name and Full name. The request parameters for these operations are identical--provide two values and the API returns a matching score on a scale of 0-100 indicating how similar the two values are and how close they are to potentially being a match. This score is determined through a series of logic that includes the use of Generative AI, Machine Learning, specialized algorithms, and extensive knowledge bases. Best practices include setting a threshold, for example 50, 60, or 70 as indicating a potential match and then dealing with the potential matches as desired.

Full Name Match Score

This API provides a match score (likelihood of matching) between two individual names on a scale of 0-100, where 100 is the highest possible match.

import { getFullNameMatchScore } from '@interzoid/data-matching';

async function fullNameMatchScore() {
  const result = await getFullNameMatchScore({
    apiKey: 'your-interzoid-api-key',
    value1: 'John Smith',
    value2: 'John Smyth'
  });
  console.log(result);
}
Result
{
  "score": "80",
  "code": "Success",
  "credits": "9999"
}

Organization Name Match Score

This API provides a match score (likelihood of matching) ranging from 0 to 100 between two organization names.

import { getOrganizationMatchScore } from '@interzoid/data-matching';

async function organizationNameMatchScore() {
  const result = await getOrganizationNameMatchScore({
    apiKey: 'your-interzoid-api-key',
    value1: 'Apple',
    value2: 'Apple Inc.'
  });
  console.log(result);
}
Result
{
  "score": "95",
  "code": "Success",
  "credits": "9998"
}

Cloud Data Connect

Introduction

Interzoid's Cloud Data Connect is a set of functions that allow you to match data in your cloud database or delimited text file such as CSV and TSV with Interzoid's data matching algorithms.

Matching Process

The process parameter determines the type of matching process to run. The package provides an enum called Process that contains the available options.

| Process | Description | |------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| | Process.MATCH_REPORT | Generate a report of all found clusters of similar data that share the same generated similarity key. | | Process.CREATE_TABLE | Creates a new table in the source database with all the similarity keys for each record in the source table, so they can be used for additional queries. | | Process.GEN_SQL | Generate the SQL INSERT statements to store the similarity keys in a database for ability to review before execution. | | Process.KEYS_ONLY | Output a generated similarity key for every record in the dataset. |

Source

The source parameter determines the type of data source containing the data you are performing matching functions with. The package provides an enum called Source that contains the available options. Some commonly used examples are:

| Source | Description | |---------------------|--------------------------------------| | Source.MYSQL | Match data in a MySQL database. | | Source.POSTGRES | Match data in a PostgreSQL database. | | Source.MARIADB | Match data in a MariaDB database. | | Source.DATABRICKS | Match data in a Databricks table. | | Source.CSV | Match data in a CSV file. |

Please see the source code for a complete list of available options.

Category

The category parameter determines the type of data you're matching. The package provides an enum called Category that contains the available options.

| Category | Description | |-----------------------|-------------------------| | Category.COMPANY | Match company names. | | Category.INDIVIDUAL | Match individual names. | | Category.ADDRESS | Match addresses. |

Connection Strings

The connection parameter is a connection string for your database. The format of the connection string depends on the database you're connecting to.

Please see this page for examples of connection strings for various databases.

Match and write results to a new table

Set the process parameter to CREATE_TABLE to create a new table in your database with the match keys. The newTable parameter is the name of the new table to create. This table will be created by the process, and will contain the original data and the similarity key.

Do not create the table manually; the process will handle the creation.

You'll have to grant the user you're connecting with the ability to create a new table in the database in addition to the ability to read from the table you're matching.

import { getCloudDatabaseMatchKeyReport, Process, Category, Source } from '@interzoid/data-matching';

async function databaseMatchKeyReport() {
  const result = await getCloudDatabaseMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.CREATE_TABLE,
    category: Category.COMPANY,
    source: Source.MYSQL,
    connection: 'db_user:db_password@tcp(db_host)/database',
    table: 'companies',                 // table to match
    column: 'companyname',              // column to match
    reference: 'id',                    // optional reference column
    newTable: 'companies_match_keys'    // new table to create
  });
  console.log(result);
}

Response

"Creating new table...Table companies_match_keys created successfully."

Match Key Report for a cloud database table

Response options

  • Set json to true to return a JSON object with arrays of match clusters.
  • Set html to true to return results in plain text with clusters separated by html <br> tags.
  • Don't set either to return results in plain text with clusters separated by newlines.
import { getCloudDatabaseMatchKeyReport, Source, Process, Category } from '@interzoid/data-matching';

async function databaseMatchKeyReport() {
  const result = await getCloudDatabaseMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.MATCH_REPORT,
    category: Category.COMPANY,
    source: Source.MYSQL,
    connection: 'db_user:db_password@tcp(db_host)/database',
    table: 'companies',
    column: 'companyname',
    reference: 'id',
    json: true,
  });
  console.log(JSON.stringify(result, null, 2));
}

Sample Response

{
  "Status": "success",
  "Message": "",
  "MatchClusters": [
    [
      {
        "Data": "Cisco",
        "Reference": "",
        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
      },
      {
        "Data": "Cisco Systems",
        "Reference": "30",
        "SimKey": "3AmCGk2yvEJ7XUxUmB3dFHxRiVzy4Squ89J-4_lDrxQ"
      }
    ],
    [
      {
        "Data": "Netflix",
        "Reference": "15",
        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
      },
      {
        "Data": "\"Netflix, Inc.\"",
        "Reference": "34",
        "SimKey": "8c6BY0KP9MYiDezQaKL3bH3iHfDU2wCMMTD9v0EeZJ8"
      }
    ]
  ]
}

Text File Match Key Report

Provide a URL to a delimited file (CSV or TSV) and the API will return a match key report for the data in the file.

import { getDelimitedFileMatchKeyReport, Process, Source, Category } from '@interzoid/data-matching';

async function csvFileMatchReport() {
  const result = await getDelimitedFileMatchKeyReport({
    apiKey: 'your-interzoid-api-key',
    process: Process.MATCH_REPORT,
    category: Category.COMPANY,
    source: Source.CSV,
    table: Source.CSV,
    connection: 'https://dl.interzoid.com/csv/companies.csv',
    column: '1',          // column number to match
    json: true,
  });
  console.log(JSON.stringify(result, null, 2));
}

Result

{
  "Status": "success",
  "Message": "",
  "MatchClusters": [
    [
      {
        "Data": "Good Year Tire & Rubber",
        "Reference": "",
        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
      },
      {
        "Data": "Goodyear Tire Inc",
        "Reference": "Transportaions",
        "SimKey": "140xAiUxvDysV56LZzogzDwLuYLd2U7E5sVAXd1nKd8"
      }
    ],
    [
      {
        "Data": "Pederson Tooling Inc.",
        "Reference": "Transportaions",
        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
      },
      {
        "Data": "Peterson Tools",
        "Reference": "Services",
        "SimKey": "7oOMieCdoyxjt7_oKbE2xGngnZGdG75CFU5pEfhU5z8"
      }
    ]
  ]
}

Account Information

This API retrieves the current amount of remaining purchased (or trial) credits for a license key.

Using this function does not deduct credits from your account.

import { getRemainingCredits } from '@interzoid/data-matching';

async function remainingCredits() {
  const result = getRemainingCredits({ apiKey: 'your-interzoid-api-key' });
  console.log(result);
}

Result

{
  "credits": "9998",
  "code": "Success"
}