Skip to main content

API

KMeans

The KMeans algorithm clusters data by trying to separate samples into k groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.

βˆ‘i=0nmin⁑μj∈C(∣∣xiβˆ’ΞΌj∣∣2)\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2)

Example

let X = [
[1, 2],
[1, 4],
[4, 4],
[4, 0]
]
const kmean = new KMeans({ nClusters: 2 })
kmean.fit(X)

Constructor

new KMeans({ nClusters?, init?, nInit?, maxIter?, tol?, randomState?})

Object Parameters

NameTypeDescription
nClustersnumberThe number of clusters for the kmeans algorithm. default = 8
init"random"Initialization strategy for KMeans. Currently it only supports 'random' which selects random points from the input to to be the initial centers. We will soon support 'kmeans++' which is an alternative initialization strategy that speeds up convergences. default = "random"
nInitnumberThe number of times to run KMeans. We choose the solution which has the smallest inertia. default = 10
maxIternumberMax number of iterations for the KMeans fit. default = 300
tolnumberTolerance is the number where if the KMeans doesn't generate a better solution than it ceases execution. default = 1e-4
randomStatenumberBecause there is a random element to KMeans, if you need a deterministic repeatable KMeans solution (for testing or other deterministic situations), you can set the random seed here. default = undefined

Properties

All of the constructor arguments above are class properties as well as

clusterCenters: Tensor2D

The actual cluster centers found by KMeans

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

Methods

fit(X): KMeans

Runs the KMeans algo over your input.

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2D Matrix that you wish to cluster

predict(X): Tensor1D

Converts 2D input into a 1D Tensor which holds the KMeans cluster Class label

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2D Matrix that you wish to cluster

transform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitPredict(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

score(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

ColumnTransformer

The ColumnTransformer transformers a 2D matrix of mixed types, with possibly missing values into a 2DMatrix that is ready to be put into a machine learning model. Usually this class does the heavy lifting associated with imputing missing data, one hot encoding categorical variables, and any other preprocessing steps that are deemed necessary (standard scaling, etc).

Example

const X = [
[2, 2],
[2, 3],
[0, NaN],
[2, 0]
]

const transformer = new ColumnTransformer({
transformers: [
['minmax', new MinMaxScaler(), [0]],
['simpleImpute', new SimpleImputer({ strategy: 'median' }), [1]]
]
})

let result = transformer.fitTransform(X)
const expected = [
[1, 2],
[1, 3],
[0, 2],
[1, 0]
]

Constructor

new ColumnTransformer({ transformers?, remainder?})

Object Parameters

NameTypeDescription
transformersTransformerTripleA list of transformations. Every element is itself a list [name, Transformer, Selection]. default = []
remainderTransformer | "drop" | "passthrough"What should we do with the remainder columns? Possible values for remainder are a Transformer that will be applied to all remaining columns. It can also be 'passthrough' which simply passes the columns untouched through this, or 'drop', which drops all untransformed columns. default = "drop"

Properties

All of the constructor arguments above are class properties as well as

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

Methods

fit(X, y?): ColumnTransformer

Parameters

NameType
XTensor2D | DataFrameInterface
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

transform(X, y?): any

Parameters

NameType
XTensor2D | DataFrameInterface
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

fitTransform(X, y?): any

Parameters

NameType
XTensor2D | DataFrameInterface
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

makeRegression

Signature

makeRegression():  \|  \|

makeLowRankMatrix

Signature

makeLowRankMatrix(): Tensor2D

DummyClassifier

Creates an classifier that guesses a class label based on simple rules. By setting a strategy (ie 'mostFrequent', 'uniform', or 'constant'), you can create a simple classifier which can be helpful in determining if a more complicated classifier is actually more predictive.

Example

import { DummyClassifier } from 'scikitjs'

const clf = new DummyClassifier({ strategy: 'mostFrequent' })
const X = [
[-1, 5],
[-0.5, 5],
[0, 10]
]
const y = [10, 20, 20] // 20 is the most frequent class label
clf.fit(X, y) // always predicts 20

clf.predict([
[0, 0],
[1000, 1000]
]) // [20, 20]

Constructor

new DummyClassifier({ strategy?, constant?})

Object Parameters

NameTypeDescription
strategy"constant" | "mostFrequent" | "uniform"If strategy is "mostFrequent" than the most frequent class label is chosen no matter the input. If "uniform" is chosen than a uniformly random class label is chosen for a given input. If "constant" is chosen than you must supply a constant number and this classifier returns that number a given input. default = "mostFrequent"
constantnumberIf strategy is "constant" than this number is returned for every input. default = undefined

Properties

All of the constructor arguments above are class properties as well as

classes: number[] | string[]

These are the unique class labels that are seen during fit.

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

EstimatorType: string

Methods

fit(X, y): DummyClassifier

Fit a DummyClassifier to the data.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predictProba(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predict(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | Series

DummyRegressor

Builds a regressor with simple rules.

Example

import { DummyRegressor } from 'scikitjs'
const reg = new DummyRegressor({ strategy: 'mean' })

const X = [
[-1, 5],
[-0.5, 5],
[0, 10]
]
const y = [10, 20, 30] // The mean is 20
reg.fit(X, y) // This regressor will return 20 for any input

Constructor

new DummyRegressor({ strategy?, constant?, quantile?})

Object Parameters

NameTypeDescription
strategy"mean" | "median" | "constant" | "quantile"The strategy that this DummyRegressor will use to make a prediction. Accepted values are 'mean', 'median', 'constant', and 'quantile'.If 'mean' is chosen then the DummyRegressor will just return the 'mean'
of the target variable as it's prediction.

Likewise with 'median'.

If "constant" is chosen, you will have to supply the constant number, and this regressor will always
return that value.

If "quantile" is chosen, you'll have to chosen the quantile value between 0 < quantile < 1.
And that value will be returned always. default = mean
constantnumberIn the case where you chose 'constant' as your strategy, this number will be the number that is predicted for any input.Every constructor parameter is used as a class variable as well.
If "mean", "median", or "quantile" are chosen the class variable "constant" will be
set with the "mean", "median", or "quantile" after fit.
quantilenumberThe quantile to predict in the quantile strategy. 0.5 is the median. 0.0 is the min. 1.0 is the max

Properties

All of the constructor arguments above are class properties as well as

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

EstimatorType: string

Methods

fit(X, y): DummyRegressor

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predict(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

VotingClassifier

A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

Example

import {
VotingClassifier,
DummyClassifier,
LogisticRegression
} from 'scikitjs'

const X = [
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[4, 4]
]
const y = [0, 0, 1, 1, 1]
const voter = new VotingClassifier({
estimators: [
['dt', new DummyClassifier()],
['dt', new DummyClassifier()],
['lr', new LogisticRegression({ penalty: 'none' })]
]
})

await voter.fit(X, y)
assert.deepEqual(voter.predict(X).arraySync(), [1, 1, 1, 1, 1])

Constructor

new VotingClassifier({ estimators?, weights?, voting?})

Object Parameters

NameTypeDescription
estimators[]List of name, estimator pairs. Example [['lr', new LinearRegression()], ['dt', new DecisionTree()]]
weightsnumber[]The weights for the estimators. If not present, then there is a uniform weighting.
voting"hard" | "soft"If β€˜hard’, uses predicted class labels for majority rule voting. Else if β€˜soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

Properties

All of the constructor arguments above are class properties as well as

le: any

name: string

tf: any

EstimatorType: string

Methods

fit(X, y): Promise<VotingClassifier>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predictProba(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predict(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D[] \| Tensor1D[]

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitTransform(X, y): Promise<Tensor2D[] \| Tensor1D[]>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | Series

makeVotingClassifier

Signature

makeVotingClassifier(): VotingClassifier

VotingRegressor

A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

Example

import {
VotingRegressor,
DecisionTreeRegressor,
LinearRegression
} from 'scikitjs'

const X = [
[2, 2],
[2, 3],
[5, 4],
[1, 0]
]
const y = [5, 3, 4, 1.5]
const voter = new VotingRegressor({
estimators: [
['dt', new DecisionTreeRegressor()],
['lr', new LinearRegression({ fitIntercept: false })]
]
})

await voter.fit(X, y)

Constructor

new VotingRegressor({ estimators?, weights?})

Object Parameters

NameTypeDescription
estimators[]List of name, estimator pairs. Example [['lr', new LinearRegression()], ['dt', new DecisionTree()]]
weightsnumber[]The weights for the estimators. If not present, then there is a uniform weighting.

Properties

All of the constructor arguments above are class properties as well as

name: string

EstimatorType: string

Methods

fit(X, y): Promise<VotingRegressor>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predict(X): Tensor1D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor1D[]

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitTransform(X, y): Promise<Tensor1D[]>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

makeVotingRegressor

Helper function for make a VotingRegressor. Just pass your Estimators as function arguments.

Example

import {
makeVotingRegressor,
DummyRegressor,
LinearRegression
} from 'scikitjs'
const X = [
[1, 2],
[2, 1],
[2, 2],
[3, 1]
]
const y = [3, 3, 4, 4]
const voter = makeVotingRegressor(
new DummyRegressor(),
new LinearRegression({ fitIntercept: true })
)

await voter.fit(X, y)

Signature

makeVotingRegressor(): VotingRegressor

SimpleImputer

Constructor

new SimpleImputer({ strategy?, fillValue?, missingValues?})

Object Parameters

NameTypeDescription
strategy"mean" | "median" | "constant" | "mostFrequent"The strategy you'd use to impute missing values. "mean" means fill missing values with the mean. Likewise for "median" and "mostFrequent". Use "constant" if you'd like to pass in a "fillValue" and use that to fill missing values. default = "mean"
fillValuestring | numberIf you choose "constant" pick a value that you'd like to use to fill the missing values. default = undefined
missingValues| string | numberThis value is the actual missing value. default = NaN

Properties

All of the constructor arguments above are class properties as well as

statistics: Tensor1D

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

Methods

fit(X): SimpleImputer

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

ElasticNet

Linear regression with combined L1 and L2 priors as regularizer.

Constructor

new ElasticNet({ alpha?, l1Ratio?, fitIntercept?})

Object Parameters

NameTypeDescription
alphanumberConstant that multiplies the penalty terms. default = .01
l1RationumberThe ElasticNet mixing parameter. default = .5
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

isMultiOutput: boolean

optimizerType: OptimizerTypes

lossType: LossTypes

EstimatorType: string

Methods

fit(X, y): Promise<SGDRegressor>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predict(X): Tensor2D \| Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

LassoRegression

Linear Model trained with L1 prior as regularizer (aka the Lasso).

Constructor

new LassoRegression({ fitIntercept?, alpha?})

Object Parameters

NameTypeDescription
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true
alphanumberConstant that multiplies the L1 term. defaults = 1.0

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

isMultiOutput: boolean

optimizerType: OptimizerTypes

lossType: LossTypes

EstimatorType: string

Methods

fit(X, y): Promise<SGDRegressor>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predict(X): Tensor2D \| Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

LinearRegression

Constructor

new LinearRegression({ fitIntercept?, modelFitOptions?})

Object Parameters

NameTypeDescription
fitInterceptbooleanWhether to calculate the intercept for this model. If set to False, no intercept will be used in calculations. default = true
modelFitOptionsPartial

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

isMultiOutput: boolean

optimizerType: OptimizerTypes

lossType: LossTypes

EstimatorType: string

Methods

fit(X, y): Promise<SGDRegressor>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predict(X): Tensor2D \| Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

LogisticRegression

Builds a linear classification model with associated penalty and regularization

Example

let X = [
[1, -1],
[2, 0],
[2, 1],
[2, -1],
[3, 2],
[0, 4],
[1, 3],
[1, 4],
[1, 5],
[2, 3]
]
let y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

let logreg = new LogisticRegression({ penalty: 'none' })
await logreg.fit(X, y)

Constructor

new LogisticRegression({ penalty?, C?, fitIntercept?, modelFitOptions?})

Object Parameters

NameTypeDescription
penalty"l1" | "l2" | "none"Specify the norm of the penalty. default = l2
CnumberInverse of the regularization strength. default = 1
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true
modelFitOptionsPartial

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

optimizerType: OptimizerTypes

lossType: LossTypes

oneHot: OneHotEncoder

tf: any

isMultiOutput: boolean

EstimatorType: string

Methods

fit(X, y): Promise<SGDClassifier>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predictProba(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predict(X): Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | Series

RidgeRegression

Linear least squares with l2 regularization.

Constructor

new RidgeRegression({ fitIntercept?, alpha?})

Object Parameters

NameTypeDescription
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true
alphanumberConstant that multiplies the penalty terms. default = .01

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

isMultiOutput: boolean

optimizerType: OptimizerTypes

lossType: LossTypes

EstimatorType: string

Methods

fit(X, y): Promise<SGDRegressor>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predict(X): Tensor2D \| Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

KFold

K-Fold cross-validatorGenerates train and test indices to split data in train/test subsets. To generate these subsets, the dataset is split into k (about) evenly sized chunks of consecutive elements. Each split takes another chunk as test data and the remaining chunks are combined to be the training data.

Optionally, the indices can be shuffled before splitting it into chunks (disabled by default).

Example

import { KFold } from 'scikitjs'

const kf = new KFold({ nSplits: 3 })

const X = tf.range(0, 7).reshape([7, 1]) as Tensor2D

console.log( 'nSplits:', kf.getNumSplits(X) )

for (const { trainIndex, testIndex } of kf.split(X) )
{
try {
console.log( 'train:', trainIndex.toString() )
console.log( 'test:', testIndex.toString() )
}
finally {
trainIndex.dispose()
testIndex.dispose()
}
}

Constructor

new KFold({ object })
constructor(__namedParameters): KFold

Parameters

NameType
__namedParametersKFoldParams

Properties

All of the constructor arguments above are class properties as well as

nSplits: number

shuffle: boolean

randomState: number

name: string

tf: any

Methods

getNumSplits(undefined): number

split(X, y?, groups?): IterableIterator

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series
groupsnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

crossValScore

Evaluates a score by cross-validation. This particular overload of the function uses the given scorer function to cross validate a supervised estimator.

Signature

crossValScore(): Promise<Tensor1D>

trainTestSplit

Helper function that can split training and testing data into different splits. This helps with cross validation and model selection.

Example

import { trainTestSplit } from 'scikitjs'

let X = [
[5, 6],
[8, 2],
[3, 4]
]
let y = [10, 20, 30]

let [XTrain, XTest, yTrain, yTest] = trainTestSplit(X, y, 0.3)

Signature

trainTestSplit(): any[]

GaussianNB

Gaussian Naive Bayes classifier

Example

import { GaussianNB } from 'scikitjs'

const clf = new GaussianNB({ priors: [0.5, 0.5] })
const X = [
[0.1, 0.9],
[0.3, 0.7],
[0.9, 0.1],
[0.8, 0.2],
[0.81, 0.19]
]
const y = [0, 0, 1, 1, 1]

const model = new GaussianNB({})
await model.fit(X, y)

clf.predict([
[0.1, 0.9],
[0.01, 0.99]
]) // [0, 1]

Constructor

new GaussianNB({ object })
constructor(params): GaussianNB

Parameters

NameType
paramsNaiveBayesParams

Properties

All of the constructor arguments above are class properties as well as

priors: Tensor1D

varSmoothing: number

classes: Tensor1D

means: Tensor1D[]

variances: Tensor1D[]

tf: any

name: string

Methods

fit(X, y): Promise<GaussianNB>

Train the model by calculating the mean and variance of sample distribution.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predictProba(X): any

Predict the probability of samples assigned to each observed label.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predict(X): any

Predict the labels assigned to each sample

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

KNeighborsClassifier

K-Nearest neighbor regressor.

Example

import { KNeighborsRegressor } from 'scikitjs'

let X = [[0], [1], [2], [3]]
let y = [0, 0, 1, 1]

let knn = new KNeighborsRegressor(nNeighbor)

await knn.fit(X, y)

knn.predict([[1.5]]).print()

Constructor

new KNeighborsClassifier({ object })
constructor(params): KNeighborsClassifier

Parameters

NameType
paramsKNeighborsParams

Properties

All of the constructor arguments above are class properties as well as

SUPPORTED_ALGORITHMS: ("auto" | "kdTree" | "brute")[]

_neighborhood: undefined | Neighborhood

_y: undefined | Tensor1D

weights: undefined | "uniform" | "distance"

algorithm: undefined | "auto" | "kdTree" | "brute"

leafSize: undefined | number

p: undefined | number

metric: undefined | "euclidean" | "minkowski" | "manhattan" | "chebyshev"

nNeighbors: undefined | number

classes_: Tensor1D

score:

name: string

Methods

predictProba(X): Tensor2D

Applies this mdodel to predict the class probabilities of each given sample.

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe samples for which the targets are to be predicted, where X[i,j] is the (j+1)-th feature of the (i+1)-th sample.

predict(X): Tensor1D

Applies this mdodel to predict the class of each given sample.

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe samples for which the targets are to be predicted, where X[i,j] is the (j+1)-th feature of the (i+1)-th sample.

fit(X, labels): Promise<KNeighborsClassifier>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
labelsnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

KNeighborsRegressor

K-Nearest neighbor regressor.

Example

import { KNeighborsRegressor } from 'scikitjs'

let X = [[0], [1], [2], [3]]
let y = [0, 0, 1, 1]

let knn = new KNeighborsRegressor(nNeighbor)

await knn.fit(X, y)

knn.predict([[1.5]]).print()

Constructor

new KNeighborsRegressor({ object })
constructor(params): KNeighborsRegressor

Parameters

NameType
paramsKNeighborsParams

Properties

All of the constructor arguments above are class properties as well as

SUPPORTED_ALGORITHMS: ("auto" | "kdTree" | "brute")[]

_neighborhood: undefined | Neighborhood

_y: undefined | Tensor1D

weights: undefined | "uniform" | "distance"

algorithm: undefined | "auto" | "kdTree" | "brute"

leafSize: undefined | number

p: undefined | number

metric: undefined | "euclidean" | "minkowski" | "manhattan" | "chebyshev"

nNeighbors: undefined | number

name: string

Applies this mdodel to predicts the target of each given sample.

Methods

fit(X, y): Promise<KNeighborsRegressor>

Async function. Trains this model using the given features and targets.

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe features of each training sample, where X[i,j] is the (j+1)-th feature of (i+1)-th sample.
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesThe target of each training sample, where y[i] the the target of the (i+1)-th sample.

predict(X): any

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

Pipeline

Construct a pipeline of transformations, with the final one being an estimator. Usually this is used to perform some cleaning of the data in the early stages of the pipeline (ie. StandardScaling, or SimpleImputer), and then ending with the fitted estimator.

import { Pipeline } from 'scikitjs'

const X = [
[2, 2], // [1, .5]
[2, NaN], // [1, 0]
[NaN, 4], // [0, 1]
[1, 0] // [.5, 0]
]
const y = [5, 3, 4, 1.5]
const pipeline = new Pipeline({
steps: [
[
'simpleImputer',
new SimpleImputer({ strategy: 'constant', fillValue: 0 })
],
['minmax', new MinMaxScaler()],
['lr', new LinearRegression({ fitIntercept: false })]
]
})

await pipeline.fit(X, y)

Constructor

new Pipeline({ steps?})

Object Parameters

NameType
steps[]

Properties

All of the constructor arguments above are class properties as well as

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fit(X, y): Promise<Pipeline>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

transform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitTransform(X, y): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

predict(X): any

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fitPredict(X, y): Promise<any>

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

makePipeline

Shorthand for making a Pipeline class. Just pass your Estimators as function arguments.

Example

import {
makePipeline,
SimpleImputer,
MinMaxScaler,
LinearRegression
} from 'scikitjs'
const X = [
[2, 2],
[2, NaN],
[NaN, 4],
[1, 0]
]
const y = [5, 3, 4, 1.5]
const pipeline = makePipeline(
new SimpleImputer({ strategy: 'constant', fillValue: 0 }),
new MinMaxScaler(),
new LinearRegression({ fitIntercept: false })
)

await pipeline.fit(X, y)

Signature

makePipeline(): Pipeline

LabelEncoder

Encode target labels with value between 0 and n_classes-1.

Example

import { LabelEncoder } from 'scikitjs'

const sf = [1, 2, 2, 'boy', 'git', 'git']
const scaler = new LabelEncoder()
scaler.fit(sf)
console.log(scaler.classes) // [1, 2, "boy", "git"]
scaler.transform([2, 2, 'boy']) // [1, 1, 2]

Constructor

new LabelEncoder()

Properties

All of the constructor arguments above are class properties as well as

classes: (string | number | boolean)[]

Unique classes that we see in this single array of data

name: string

Useful for pipelines and column transformers to have a default name for transforms

tf: any

Methods

fit(X): LabelEncoder

Maps values to unique integer labels between 0 and n_classes-1.

Parameters

NameType
Xnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

transform(X): Tensor1D

Encode labels with value between 0 and n_classes-1.

Parameters

NameType
Xnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

fitTransform(X): Tensor1D

Parameters

NameType
Xnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

inverseTransform(X): any[]

Inverse transform values back to original values.

Parameters

NameType
Xnumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

MaxAbsScaler

MaxAbsScaler scales the data by dividing by the max absolute value that it finds per feature. It's a useful scaling if you wish to keep sparsity in your dataset.

Example

import { MaxAbsScaler } from 'scikitjs'

const scaler = new MaxAbsScaler()
const data = [
[-1, 5],
[-0.5, 5],
[0, 10],
[1, 10]
]

const expected = scaler.fitTransform(data)
const above = [
[-1, 0.5],
[-0.5, 0.5],
[0, 1],
[1, 1]
]

Constructor

new MaxAbsScaler()

Properties

All of the constructor arguments above are class properties as well as

scale: Tensor1D

The per-feature scale that we see in the dataset. We divide by this number.

nFeaturesIn: number

The number of features seen during fit

nSamplesSeen: number

The number of samples processed by the Estimator. Will be reset on new calls to fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X): MaxAbsScaler

Fits a MinMaxScaler to the data

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

inverseTransform(X): Tensor2D

Inverse transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

MinMaxScaler

Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between the maximum and minimum value.

Example

import { MinMaxScaler } from 'scikitjs'

const data = [
[-1, 2],
[-0.5, 6],
[0, 10],
[1, 18]
]
const scaler = new MinMaxScaler()
const expected = scaler.fitTransform(data)
// const expected = [
// [0, 0],
// [0.25, 0.25],
// [0.5, 0.5],
// [1, 1]
//]

Constructor

new MinMaxScaler({ featureRange?})

Object Parameters

NameTypeDescription
featureRangeDesired range of transformed data. default = [0, 1]

Properties

All of the constructor arguments above are class properties as well as

scale: Tensor1D

The per-feature scale that we see in the dataset.

min: Tensor1D

dataMin: Tensor1D

The per-feature minimum that we see in the dataset.

dataMax: Tensor1D

The per-feature maximum that we see in the dataset.

dataRange: Tensor1D

The per-feature range that we see in the dataset.

nFeaturesIn: number

The number of features seen during fit

nSamplesSeen: number

The number of samples processed by the Estimator. Will be reset on new calls to fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X): MinMaxScaler

Fits a MinMaxScaler to the data

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

inverseTransform(X): Tensor2D

Inverse transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

Normalizer

A Normalizer scales each sample by the l1l_1, l2l_2 or maxmax value in that sample. If you imagine the input matrix as a 2D grid, then this is effectively a "horizontal" scaling (per-sample scaling) as opposed to a StandardScaler which is a "vertical" scaling (per-feature scaling).The only input is what kind of norm you wish to scale by.

Example

import { Normalizer } from 'scikitjs'

const data = [
[-1, 1],
[-6, 6],
[0, 10],
[10, 20]
]
const scaler = new Normalizer({ norm: 'l1' })
const expected = scaler.fitTransform(scaler)
const expectedValueAbove = [
[-0.5, 0.5],
[-0.5, 0.5],
[0, 1],
[0.33, 0.66]
]

Constructor

new Normalizer({ norm?})

Object Parameters

NameTypeDescription
norm"l1" | "l2" | "max"What kind of norm we wish to scale by. default = "l2"

Properties

All of the constructor arguments above are class properties as well as

nFeaturesIn: number

The number of features seen during fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X): Normalizer

Fits a Normalizer to the data

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Transform the data using the Normalizer

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

OneHotEncoder

Fits a OneHotEncoder to the data.

Example

import { OneHotEncoder } from 'scikitjs'

const X = [
['Male', 1],
['Female', 2],
['Male', 4]
]
const encode = new OneHotEncoder()
encode.fitTransform(X) // returns the object below
const expected = [
[1, 0, 1, 0, 0],
[0, 1, 0, 1, 0],
[1, 0, 0, 0, 1]
]

Constructor

new OneHotEncoder({ categories?, handleUnknown?, drop?})

Object Parameters

NameTypeDescription
categories((string | number | boolean)[])[] | "auto"Categories (unique values) per feature: β€˜auto’ : Determine categories automatically from the training data. list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values, and should be sorted in case of numeric values. default = "auto"
handleUnknown"error" | "ignore"When set to β€˜error’ an error will be raised in case an unknown categorical feature is present during transform. When set to β€˜ignore’, the encoded value of will be all zeros In inverse_transform, an unknown category will be denoted as null. default = "error"
drop"first"Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into a neural network or an unregularized regression. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models.Options:
undefined : retain all features (the default).
β€˜first’ : drop the first category in each feature. If only one category is present, the feature will be dropped entirely.
default = undefined

Properties

All of the constructor arguments above are class properties as well as

categoriesParam: ((string | number | boolean)[])[] | "auto"

This holds the categories parameter that is passed in the constructor. this.categories holds the actual learned categories or the ones passed in from the constructor

nFeaturesIn: number

The number of features seen during fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X, y?): OneHotEncoder

Fits a OneHotEncoder to the data.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

transform(X, y?): Tensor2D

Encodes the data using the fitted OneHotEncoder.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

inverseTransform(X): any[]

Only works for single column OneHotEncoding

Parameters

NameType
XTensor2D

OrdinalEncoder

Encode categorical features as an integer array. The input to this transformer should be an array-like of integers or strings, which represent categorical (discrete) features. The features are then converted to ordinal integers.

Example

const X = [
['Male', 1],
['Female', 2],
['Male', 4]
]
const encode = new OrdinalEncoder()
encode.fitTransform(X) // returns the expected object below
const expected = [
[0, 0],
[1, 1],
[0, 2]
]

Constructor

new OrdinalEncoder({ categories?, handleUnknown?, unknownValue?})

Object Parameters

NameTypeDescription
categories((string | number | boolean)[])[] | "auto"Categories (unique values) per feature: β€˜auto’ : Determine categories automatically from the training data. list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values, and should be sorted in case of numeric values. default = "auto"
handleUnknown"error" | "useEncodedValue"When set to β€˜error’ an error will be raised in case an unknown categorical feature is present during transform. When set to β€˜use_encoded_value’, the encoded value of unknown categories will be set to the value given for the parameter unknown_value. In inverse_transform, an unknown category will be denoted as null. default = "error"
unknownValuenumberWhen the parameter handle_unknown is set to β€˜use_encoded_value’, this parameter is required and will set the encoded value of unknown categories. It has to be distinct from the values used to encode any of the categories in fit. Great choices for this number are NaN or -1. default = NaN

Properties

All of the constructor arguments above are class properties as well as

categoriesParam: ((string | number | boolean)[])[] | "auto"

This holds the categories parameter that is passed in the constructor. this.categories holds the actual learned categories or the ones passed in from the constructor

nFeaturesIn: number

The number of features seen during fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X, y?): OrdinalEncoder

Fits a OrdinalEncoder to the data.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

transform(X, y?): Tensor2D

Encodes the data using the fitted OrdinalEncoder.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

RobustScaler

Scales the data but is robust to outliers. While StandardScaler will subtract the mean, and divide by the variance, both of those measures are not robust to outliers. So instead of the mean we use the median, and instead of the variance we use the Interquartile Range (which is the distance between the quantile .25, and quantile .75).

Example

import { RobustScaler } from 'scikitjs'

const X = [
[1, -2, 2],
[-2, 1, 3],
[4, 1, -2]
]

const scaler = new RobustScaler()
scaler.fitTransform(X)

const result = [
[0, -2, 0],
[-1, 0, 0.4],
[1, 0, -1.6]
]

Constructor

new RobustScaler({ quantileRange?, withScaling?, withCentering?})

Object Parameters

NameTypeDescription
quantileRangeQuantile range used to calculate scale_. By default this is equal to the IQR, i.e., q_min is the first quantile and q_max is the third quantile. Numbers must be between 0, and 100. default [25.0, 75.0]
withScalingbooleanWhether or not we should scale the data. default = true
withCenteringbooleanWhether or not we should center the data. default = true

Properties

All of the constructor arguments above are class properties as well as

scale: Tensor1D

The per-feature scale that we see in the dataset. We divide by this number.

center: Tensor1D

The per-feature median that we see in the dataset. We subtrace this number.

nFeaturesIn: number

The number of features seen during fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X): RobustScaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

inverseTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

StandardScaler

Standardize features by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as: z=(xβˆ’u)/sz = (x - u) / s, where uu is the mean of the training samples, and ss is the standard deviation of the training samples.

Example

import { StandardScaler } from 'scikitjs'

const data = [
[0, 0],
[0, 0],
[1, 1],
[1, 1]
]

const scaler = new StandardScaler()
const expected = scaler.fitTransform(data)
// const expected = [
// [-1, -1],
// [-1, -1],
// [1, 1],
// [1, 1]
// ]

Constructor

new StandardScaler({ withMean?, withStd?})

Object Parameters

NameTypeDescription
withMeanbooleanWhether or not we should subtract the mean. default = true
withStdbooleanWhether or not we should divide by the standard deviation. default = true

Properties

All of the constructor arguments above are class properties as well as

scale: Tensor

The per-feature scale that we see in the dataset. We divide by this number.

mean: Tensor

The per-feature mean that we see in the dataset. We subtract by this number.

nFeaturesIn: number

The number of features seen during fit

nSamplesSeen: number

The number of samples processed by the Estimator. Will be reset on new calls to fit

featureNamesIn: string[]

Names of features seen during fit. Only stores feature names if input is a DataFrame

name: string

Useful for pipelines and column transformers to have a default name for transforms

Methods

fitTransform(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fit(X): StandardScaler

Fit a StandardScaler to the data.

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

transform(X): Tensor2D

Transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

inverseTransform(X): Tensor2D

Inverse transform the data using the fitted scaler

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

fromObject

Signature

fromObject(): Promise<any>

fromJSON

Signature

fromJSON(): Promise<any>

Serialize

Constructor

new Serialize()

Properties

The only properties are the arguments defined above.

Methods

LinearSVC

Builds a linear classification model with associated penalty and regularization

Example

let X = [
[1, -1],
[2, 0],
[2, 1],
[2, -1],
[3, 2],
[0, 4],
[1, 3],
[1, 4],
[1, 5],
[2, 3]
]
let y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

let svc = new LinearSVC()
await svc.fit(X, y)

Constructor

new LinearSVC({ penalty?, C?, fitIntercept?})

Object Parameters

NameTypeDescription
penalty"l1" | "l2" | "none"Specify the norm of the penalty. default = l2
CnumberInverse of the regularization strength. default = 1
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

optimizerType: OptimizerTypes

lossType: LossTypes

oneHot: OneHotEncoder

tf: any

isMultiOutput: boolean

EstimatorType: string

Methods

fit(X, y): Promise<SGDClassifier>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predictProba(X): Tensor2D

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predict(X): Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | Series

LinearSVR

Builds a linear classification model with associated penalty and regularization

Example

let X = [
[1, -1],
[2, 0],
[2, 1],
[2, -1],
[3, 2],
[0, 4],
[1, 3],
[1, 4],
[1, 5],
[2, 3]
]
let y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

let svc = new LinearSVC()
await svc.fit(X, y)

Constructor

new LinearSVR({ epsilon?, C?, fitIntercept?})

Object Parameters

NameTypeDescription
epsilonnumberEpsilon parameter in the epsilon-insensitive loss function. Note that the value of this parameter depends on the scale of the target variable y. If unsure, set epsilon=0.
CnumberInverse of the regularization strength. default = 1
fitInterceptbooleanWhether or not the intercept should be estimator not. default = true

Properties

All of the constructor arguments above are class properties as well as

model: any

modelFitArgs: ModelFitArgs

modelCompileArgs: ModelCompileArgs

denseLayerArgs: any

isMultiOutput: boolean

optimizerType: OptimizerTypes

lossType: LossTypes

EstimatorType: string

Methods

fit(X, y): Promise<SGDRegressor>

Similar to scikit-learn, this trains a model to predict y, from X. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to use as a training matrix
y(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe | number[] | string[] | boolean[] | TypedArray | Tensor1D | SeriesEither 1D or 2D array / Tensor that you wish to predict

predict(X): Tensor2D \| Tensor1D

Similar to scikit-learn, this returns a Tensor2D (2D Matrix) of predictions. Even in the case where we predict a single output vector, the predictions are a 2D matrix (albeit a single column in a 2D Matrix).

Parameters

NameTypeDescription
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | DataframeThe 2DTensor / 2D Array that you wish to run through your model and make predictions.

score(X, y): number

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

setBackend

Signature

setBackend(): void

getBackend

Signature

getBackend(): any

ClassificationCriterion

Constructor

new ClassificationCriterion({ object })
constructor(__namedParameters): ClassificationCriterion

Parameters

NameType
__namedParameters

Properties

All of the constructor arguments above are class properties as well as

y: number[]

impurityMeasure: ImpurityMeasure

start: number

end: number

pos: number

nLabels: number

labelFreqsTotal: number[]

labelFreqsLeft: number[]

labelFreqsRight: number[]

nSamples: number

nSamplesLeft: number

nSamplesRight: number

name: string

Methods

RegressionCriterion

Constructor

new RegressionCriterion({ object })
constructor(__namedParameters): RegressionCriterion

Parameters

NameType
__namedParameters

Properties

All of the constructor arguments above are class properties as well as

y: number[]

impurityMeasure: "squared_error"

start: number

end: number

pos: number

squaredSum: number

squaredSumLeft: number

squaredSumRight: number

sumTotal: number

sumTotalLeft: number

sumTotalRight: number

nSamples: number

nSamplesLeft: number

nSamplesRight: number

name: string

Methods

DecisionTree

Constructor

new DecisionTree()

Properties

All of the constructor arguments above are class properties as well as

nodes: Node[]

isBuilt: boolean

name: string

Methods

DecisionTreeBase

Constructor

new DecisionTreeBase({ object })
constructor(__namedParameters): DecisionTreeBase

Parameters

NameType
__namedParametersDecisionTreeBaseParams

Properties

All of the constructor arguments above are class properties as well as

splitter: Splitter

stack: NodeRecord[]

minSamplesLeaf: number

maxDepth: number

minSamplesSplit: number

minImpurityDecrease: number

tree: DecisionTree

criterion: ImpurityMeasure

maxFeatures: number | "auto" | "log2" | "sqrt"

maxFeaturesNumb: number

X: number[][]

y: number[]

labelEncoder: LabelEncoder

name: string

Methods

fit(X, y, samplesSubset?): void

Parameters

NameType
Xnumber[][]
ynumber[]
samplesSubsetnumber[]

DecisionTreeClassifier

Build a Decision Tree for Classification problems.

Example

import { DecisionTreeClassifier } from 'scikitjs'

const X = [
[0.1, 0.9],
[0.3, 0.7],
[0.9, 0.1],
[0.8, 0.2],
[0.81, 0.19]
]
const y = [0, 0, 1, 1, 1]

const clf = new DecisionTreeClassifier({ criterion: 'gini', maxDepth: 4 })
await clf.fit(X, y)

clf.predict([
[0.1, 0.9],
[0.01, 0.99]
]) // [0, 1]

Constructor

new DecisionTreeClassifier({ criterion?, maxDepth?, minSamplesSplit?, minSamplesLeaf?, maxFeatures?, minImpurityDecrease?})

Object Parameters

NameTypeDescription
criterion"gini" | "entropy"The function to measure the quality of the split. Default is gini.
maxDepthnumberThe maximum depth of the tree. Default is undefined.
minSamplesSplitnumberThe minimum number of samples that you'd need before you can split on that node. Default is 2.
minSamplesLeafnumberThe minimum number of samples that every leaf must contain. Default is 1.
maxFeaturesnumber | "auto" | "log2" | "sqrt"The number of features that you would consider. Default is undefined.
minImpurityDecreasenumberThe amount of impurity that would need to exist before you could split.

Properties

All of the constructor arguments above are class properties as well as

splitter: Splitter

stack: NodeRecord[]

tree: DecisionTree

maxFeaturesNumb: number

X: number[][]

y: number[]

labelEncoder: LabelEncoder

name: string

Methods

fit(X, y): DecisionTreeClassifier

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

getNLeaves(undefined): number

predict(X): any[]

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe

predictProba(X): number[][]

Parameters

NameType
Xnumber[][]

score(X, y): number

Parameters

NameType
Xnumber[][]
ynumber[]

DecisionTreeRegressor

Constructor

new DecisionTreeRegressor({ criterion?, maxDepth?, minSamplesSplit?, minSamplesLeaf?, maxFeatures?, minImpurityDecrease?})

Object Parameters

NameTypeDescription
criterion"squared_error"The function to measure the quality of the split. Default is squared_error.
maxDepthnumberThe maximum depth of the tree. Default is undefined.
minSamplesSplitnumberThe minimum number of samples that you'd need before you can split on that node. Default is 2.
minSamplesLeafnumberThe minimum number of samples that every leaf must contain. Default is 1.
maxFeaturesnumber | "auto" | "log2" | "sqrt"The number of features that you would consider. Default is undefined.
minImpurityDecreasenumberThe amount of impurity that would need to exist before you could split.

Properties

All of the constructor arguments above are class properties as well as

splitter: Splitter

stack: NodeRecord[]

tree: DecisionTree

maxFeaturesNumb: number

X: number[][]

y: number[]

labelEncoder: LabelEncoder

name: string

Methods

fit(X, y): DecisionTreeRegressor

Parameters

NameType
X(number | string | boolean)[][] | TypedArray[] | Tensor2D | Dataframe
ynumber[] | string[] | boolean[] | TypedArray | Tensor1D | Series

getNLeaves(undefined): number

predict(X): number[]

Parameters

NameType
Xnumber[][]

score(X, y): number

Parameters

NameType
Xnumber[][]
ynumber[]

Splitter

Constructor

new Splitter({ object })
constructor(__namedParameters): Splitter

Parameters

NameType
__namedParameters

Properties

All of the constructor arguments above are class properties as well as

kMinSplitDiff: number

X: number[][]

y: number[]

criterion: ClassificationCriterion | RegressionCriterion

start: number

end: number

minSamplesLeaf: number

maxFeatures: number

featureOrder: number[]

shuffleFeatures: boolean

sampleMap: Int32Array

nSamplesTotal: number

nFeatures: number

name: string

Methods