node-data-preprocessing
v1.3.0
Published
A node package for data preprocessing.
Downloads
6
Maintainers
Readme
node-data-preprocessing
A node package for data preprocessing.
The package exposes the individual steps, as well as one to the entire process.
Individual Steps
csvParser
var data = process.csvParser(options);
options
List of options with defaults -
var options = {
path: ''
};
path
String - The path to the data.
extract
var extracted = process.extract(options, data);
options
List of options with defaults -
var options = {
useHeaders: 'true'
};
useHeaders
Boolean - Indicates whether the first row of data is the heading or not. Note - The heading will not be used in the process. Setting it to true simply strips the first row from the data.
cleanse
var cleansed = process.cleanse(options, data);
options
List of options with defaults -
var options = {
formats: [],
ranges: []
};
formats
Array - of strings representing the formats that the fields should be. The string should match the result of typeof()
applied to the expected data format.
ranges
Array - of objects, such that { 'validatorName': 'validatorValue' }.
validators
Available validators -
- greater - expects
value, min
, returnsvalue > min
; - greaterOrEqual - expects
value, min
, returnsvalue >= min
; - less - expects
value, max
, returnsvalue > max
; - lessOrEqual - expects
value, max
, returnsvalue > max
; - between - expects
value, range
, where range is a string such that'min-max'
, and returnsgreater(value, min) && less(value, max)
; - betweenOrEqual - expects
value, range
, where range is a string such that'min-max'
, and returnsgreaterOrEqual(value, min) && lessOrEqual(value, max)
;
standardise
var standardised = process.standardise(options, data);
options
List of options with defaults -
var options = {
min: 0.1,
max: 0.9,
standardisationMethod: 'default'
};
min
number - The minimum value for the standardisation.
max
number - The maximum value for the standardisation.
standardisationMethod
string - Can be default
, normal
or ss
(Sum of Squares).
ignore
Array - of integers representing columns of the data to ignore while standardising. They will retain their non-standardised values.
divide
var divided = process.divide(options, data);
options
List of options with defaults -
var options = {
split: [60, 20, 20]
};
split
Array - Indicates how many subsets the data should be split into, and with what weighting.
process (combined)
var result = process.process(options);
options
The combined proces takes all the options that the individual steps take, in one object.
var options = {
path: '',
useHeaders: true,
formats: [],
min: 0.1,
max: 0.9,
standardisationMethod: 'default',
split: [60, 20, 20]
};