npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@datatables/type-detector

v1.1.6

Published

Raw data type detection.

Downloads

66

Readme

Type-Detection

This Library provieds raw data type detection for Numbers, HTML, Dates and Strings. It was created with ❤ by the CloudTables team.

Install

Install from npm by using

npm install @datatables/type-detector

Usage

ES3

var TypeDetect = require('@datatables/type-detector');

var detector = new TypeDetect["default"]();

var type = detector.typeDetect(dataArray);

ES6

import TypeDetect from '@datatables/type-detector';

let detector = new TypeDetect();

let type = detector.typeDetect(dataArray);

Where dataArray is a single dimensional array of data who's type is to be detected.

The constructor takes 2 optional parameters as follows. | Parameter | Default | Description | |:---------:|:-------:|:-----------:| | decimalCharacter | '.' | This is used when determining the number of decimal places that a number has. | | thousandsSeparator | ',' | This is removed from strings when attempting to determine a number type. |

API

There is a single API method i18n().

detector.i18n(langOpts)

The langOpts parameter is an object that is used to add internationalisation options used to detect dates. The default value of the internal object is as follows.

{
	"abbrDays": {
		"deDE": /^(mo\.|di\.|mi\.|do\.|fr\.|sa\.|so\.)$/gi,
		"en": /^(mon|tue|wed|thu|fri|sat|sun)$/gi,
		"esES": /^(lun\.|mar\.|mié\.|jue\.|vie\.|sáb\.|dom\.)$/gi,
		"frFR": /^(lun\.|mar\.|mer\.|jeu\.|ven\.|sam\.|dim\.)$/gi
	},
	"abbrMonths": {
		"deDE": /^(jan\.|feb\.|märz\.|apr\.|mai\.|juni\.|juli\.|aug\.|sep\.|okt\.|nov\.|dez\.)$/gi,
		"en": /^(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/gi,
		"esES": /^(ene\.|feb\.|mar\.|abr\.|may\.|jun\.|jul\.|ago\.|sep\.|oct\.|nov\.|dic\.)$/gi,
		"frFR": /^(janv\.|févr\.|mars\.|avr\.|mai\.|juin\.|juil\.|août\.|sept\.|oct\.|nov\.|dec\.)$/gi
	},
	"days": {
		"deDE": /(montag|dienstag|mittwoch|donnerstag|freitag|samstag|sonntag)/gi,
		"en": /(monday|tuesday|wednesday|thursday|friday|saturday|sunday)/gi,
		"esES": /(lunes|martes|miércoles|jueves|viernes|sábado|domingo)/gi,
		"frFR": /(lundi|mardi|mercredi|jeudi|vendredi|samedi|dimanche)/gi
	},
	"months": {
		"deDE": /^(januar|februar|märz|april|mai|juni|juli|august|september|oktober|november|dezember)$/gi,
		"en": /^(january|february|march|april|may|june|july|august|september|october|november|december)$/gi,
		"esES": /^(enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)$/gi,
		"frFR": /^(janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre)$/gi
	},
	"postFixes": {
		"deDE": /^[0-9]+(st|nd|rd|th)$/gi,
		"en":   /^[0-9]+(st|nd|rd|th)$/gi,
		"esES": /^[0-9]+(st|nd|rd|th)$/gi,
		"frFR": /^[0-9]+(st|nd|rd|th)$/gi
	}
}

To add internationalisation options for a language(s) an object of this form must be passed in. The values within the RegExps need to be parseable by moment otherwise dates will not be identified correctly. The key of the object value pairs should match the locale for the language that is being added. For example to add english as a language you would make the following call.

detector.i18n({
	"abbrDays": {
		"en": /^(mon|tue|wed|thu|fri|sat|sun)$/gi,
	},
	"abbrMonths": {
		"en": /^(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/gi,
	},
	"days": {
		"en": /(monday|tuesday|wednesday|thursday|friday|saturday|sunday)/gi,
	},
	"months": {
		"en": /^(january|february|march|april|may|june|july|august|september|october|november|december)$/gi,
	},
	"postFixes": {
		"en":   /^[0-9]+(st|nd|rd|th)$/gi,
	}
});

The method returns the detector instance so that chaining can occur.

Return Format

The following object is returned from the detector.typeDetect(dataArray) call mentioned above. If a given property is not relevant to the type that has been detected then it is set to null.

{
	type: null | string;
	format: null | string;
	locale: null | string;
	prefix: null | string;
	postfix: null | string;
	dp: null | number;
}

| Property | Description | |:--------:|:-----------:| | type | The type that has been determined for the data. | | format | If a date has been detected as the type then this property is set to the moment format of that date. | | locale | If a date has been detected then this property is set to store the locale of that date. | | prefix | If numeric data has been detected then any common prefix across the entire data set is stored on this property. This excludes any numbers that are at the end of the prefix. | | postfix| If numeric data has been detected then any common postfix across the entire data set is stored on this property. This excludes any numbers that are at the start of the postfix. | | dp | If numeric data has been detected then the largest number of decimal places present within the data set is stored on this property. |

Detectable Types

The following types can be detected by the Type-Detection library. They are listed here in the order that identification is attempted in.

Excel

CloudTables allows the importing of data from excel sheets. When this occurs the data that is read in has both a value and a format within an object.

[{ value: 1.1, excel: '#.#'}, { value: 2.5, excel: '#.#'}, { value: 3.6, excel: '#.#'}]

The data within this set is numeric, and therefore the type returned will be number. The dp, prefix and postfix properties are also determined using a combination of the value and excel properties. This page is useful for understanding the different formats that excel can create.

Excel imports can also include dates, in which case a format is also set in the excel property. In this case the date detection methods implemented within this library are used as the excel date formats are different to the moment formats that Cloudtables uses.

Strings and can also be determined from excel data.

Numbers

The Type-Detection library is capable of identifying a variety of formats of numbers.

  • JS numbers (100)
  • Numeric Strings ('100')
  • Numeric Strings with a prefix, such ascurrency ('$100')
  • Numeric Strings with a postfix, such as a unit ('100 cm')
  • Numeric Strings with both a prefix and a postfix ('$100 per month')
  • All of the above with decmial places, '.' by default (100.5, '100.5', '$100.50', '100.5 cm', '$100.50 per month')
  • All of the above (except JS Numbers) with a thousands separator, ',' by default ('100,000.5', '$100,000.50', '100,000.5 cm', '$100,000.50 per month')

There is no limit to the size of prefixes or postfixes.

Numbers

The Type-Detection library is capable of identifying sequences of numbers. To do this all of the numbers in the data must be integer values. The must also not include and prefixes or postfixes, but they can include thousands separators.

  • JS numbers (1, 2, 3)
  • Numeric Strings ('1', '2', '3')
  • Numeric Strings ('1,000', '2,000', '3,000')

HTML

To determine HTML the string inputs are checked against the following Regular Expression...

/<(“[^”]*”|'[^’]*’|[^'”>])*>/g

A data set does not need to be solely comprised of html code to get the html type. If there are also strings present then an html type can be returned.

Dates

Moment is used to aid in the detection of dates and their formats. Any date containing the following tokens can be detected, but each token should only be used once in the field.

Tokens

| Token | Description | Example | Extra Notes | |:-----:|:-----------:|:-------:|:-----------:| | YYYY | Full Year | 2020, 2021 | Cannot be used with YY | | YY | Short Year | 20, 21 | Cannot be used with YYYY | | MMMM | Full Month | January, February | Cannot be used with MMM, MM, M | | MMM | Abbreviated Month | Jan, Feb | Cannot be used with MMMM, MM, M | | MM | Month as Number | 01, 02 | Cannot be used with MMMM, MMM, M | | M | Month as Short Number | 1, 2 | Cannot be used with MMMM, MMM, MM | | DD | Day as Number | 01, 02 | Cannot be used with D, Do | | D | Day as Short Number | 1, 2 | Cannot be used with DD, Do | | Do | Day with Postfix | 1st, 2nd | Cannot be used with DD, D | | dddd | Day of the Week | Monday, Tuesday | Cannot be used with ddd | | ddd | Abbreviated Day of the Week | Mon, Tue | Cannot be used with dddd | | HH | 24 Hour | 01, 02, ..., 23 | Cannot be used with H, hh, h. If used with A or a will become h | | H | 24 Hour Short | 1, 2, ..., 23 | Cannot be used with HH, hh, h. If used with A or a will become h | | hh | 12 Hour | 01, 02, ..., 12 | Cannot be used with HH, H, h | | h | 12 Hour Short | 1, 2, ..., 12 | Cannot be used with HH, H, hh | | mm | Minutes | 01, 02, ..., 59 | Cannot be used with m | | m | Minutes Short | 1, 2, ..., 59 | Cannot be used with mm | | ss | Seconds | 01, 02, ..., 59 | Cannot be used with s | | s | Seconds Short | 1, 2, ..., 59 | Cannot be used with ss | | A | AM/PM in Capitals | AM, PM | Cannot be used with HH, H | | a | am/pm in lower case | am, pm | Cannot be used with HH, H |

Some Examples

The date detection segment will always return one of the following.

  • A moment format - in this case a date has been succesfully identified with a consistent format.
  • A mixed type - in this case multiple different date formats have been identified.
  • null - no dates could be identified and other types will be checked.

Consider the following array of data.

['Thursday 1st Oct, 2020 11:12:47 AM','Sunday 17th Oct, 1999 10:56:07 PM','Friday 3rd Jan, 2025 9:02:00 AM']

The moment format that is detected is dddd Do MMM, YYYY h:mm:ss A, which is correct. Some changes can be made to this data to create ambiguity in the format that is used. If the AM/PM part of the dates is removed as follows...

['Thursday 1st Oct, 2020 11:12:47','Sunday 17th Oct, 1999 10:56:07','Friday 3rd Jan, 2025 9:02:00']

Now the moment format that is returned is dddd Do MMM, YYYY H:mm:ss, Notice the change from 12 hours (h) to 24 hours (H). In this case either token is valid, H is selected as it allows more flexibility if more data is added in the future.

There are a number of cases such as the above where some ambiguity is present - this is why there isn't a pre-existing type detection library in common use. A common example that is used is...

['10/10/10', '11/11/11', '12/12/12']

There isn't a definitive way to derive a format for these tokens, instead a good guess is taken. The moment format that is produced in this case is M/D/Y which again is valid, but it could be MM/DD/YY, YY/MM/DD , YY/D/M or a whole variety of combinations of the Day, Month and Year. The best that can be done in situations like these is to take a guess with a format that is valid for the entire set.

When it comes to dates in the above format where certainty is not guaranteed the order of priority is Months > Days > Years.

When it comes to time the order of priority is Hours > Minutes > Seconds.

Locales

Differnet locales can also be detected where the tokens are strings rather than numbers (Do, dddd etc.). The languages that are currently integrated are

  • English (default)
  • German
  • French
  • Spanish

As with other elements of date detection, where a single locale cannot be determined, one will be selected that can apply to all of the data in the set.

The detection of locales is completed using a series of Regular Expressions. These are iterated over in an attempt to find matches.

Mixed

If multiple different types are detected across the entire dataset then a mixed type is returned. This type will also be returned if there are dates with multiple different formats.

Note. The only exception to the mixed rule is a combination of strings and html. In this case a type of html will be returned.

Strings

If no other type can be detected then the type defaults to string.

Performance

The library has been tested with a data set 10,000 elements in length for each type that can be detected. The response times achieved were around 10-20 ms for most types. The exception to this was complex date formats where the response time increased to around 400 ms.

Pull requests are welcome and encouraged to increase the performance of the library.