Scikit-Learn Library

Sci-kit Learn

Data and Feature Processing


Modelling

3 Steps

  1. Instantiate model
  2. model.fit
  3. model.predict

Supervised Learning

Supervised Learning

Unsupervised Learning

unsupervised learning

  • neighbors.
    • KNeighborsClassifier(3, weights='distance')
  • cluster
    • KMeans
  • decomposition

Model Training and Model Selection

sklearn.

  • model_selection
    • train_test_split(X, y, test_size, [random_state, stratify])
    • GridSearchCV
      • tune hyper parameters
  • metrics
    • roc_curve, auc
    • confusion_matrix, precision_score, recall_score, classification_report

Pasty

Pasty

It is closely inspired by and compatible with the formula mini-language used in R and S

pasty.

  • dmatrices('y~x0+x1[+0])
    • returns nd array with additional info
      • X.design_info
    • can use standardize(x), center(x), C(x)
      • C(x) - categorical data
        • treat like dummy variable automatically
  • build_design_matrices(<design_info>, new data)

pasty objects can be taken directly to methods like

numpy.linalg


StatsModel

StatsModels include classical frequentists statistical models like

  • Linear Models, generalized linear models
  • Linear Mixed Effects Models
  • Analysis of Variance methods
  • Time Series Processing and State Space Models
  • Generalized Methods of Moments

Basic Usage

statsmodels.api - array based model api

sm.

  • add_constant()
  • OLS
    • yields a model
  • tsa

statsmodels.formula.api - formula (pasty-like) based model api

smf.

  • ols

model.

  • fit()
  • predict()

Additional Libraries

Boosting Trees

XGBoost

LightGBM

Examples

H2O

H2O is a Java-based software for data modeling and general computing. The H2O software is many things, but the primary purpose of H2O is as a distributed (many machines), parallel (many CPUs), in memory (several hundred GBs Xmx) processing engine.

H2O algorithms

Vowpal Wabbit

Vowpal Wabbit

results matching ""

    No results matching ""