###Basic Questions

  1. What is Supervised Learning vs Unsupervised Learning? Supervise Learning is on Labelled Data ( We have both Response variable/output variable/dependent variable AND predictors/indepedent varibables/covariates/features )
  2. Describe a Model you have built.

General Model Building Questions

  1. Describe the Key Steps of model building:
    1. Assumptions (at least, the distribution of Y) - After data visualization maybe
    2. Assumption validations (residual analysis, eg) - Diagonistic Graphs, Make Transformations and Adjustments
      1. residuals versus fitted values
      2. normal probability plots
      3. Plots of residuals versus time
    3. Model Building
      1. outliers
      2. transformations
      3. kernels
    4. fit outside the training sample ( overfit, underfit problem, model selection)
    5. use the model to make predictions
  2. Describe some Diagnostic Plots
  3. Coefficient of Determination, R^2

adjusted R^2

Detecting influential observations

Cook’s Distance

“Arms” Graph

improvement : multiple models, robust regression, weighted least squares

Transformation-response

Box-Cox Transformation

Yeo-Johnson Transformation

1. Regression Models

generalized linear models( GLM)

logist

Multinotional Logistic Regression
Possion Regression
Binomial Regression
Linear Regression
  1. Derive the OSL/MLE for linear regression

Assumption for Linear Regressions:

t-test for slope:

  $$
  t = \frac{\hat{b}}{s_e} , df = n - p - 1
  $$
  1. F-test for significance of current model:
  $$
  F = \frac{MSR}{MSE} = \frac{SSR\backslash p}{SSE\backslash (n-p-1)}, df = (p, n-p-1)
  $$
  1. Breusch-Pagan test (for (conditional) heteroskesiticity) : chi-square test
  $$
  \chi^2 = n \times R_{resid}^2
  $$
  1. Durbin-Waston Test (for autocorrelation)

$$ DW = \frac{\sum{t = 2}^T (\hat{\epsilon}_t - \hat{\epsilon}{t-1})^2}{\sum_{t = 1}^T \hat{\epsilon}_t^2}

$$

  1. Test multicollinearity:
    1. perfect case: compare rank

standard error for estimates: \sqrt{SSE/n-2}

Logistic Regression

logit function, logistic(sigmoid) function

Hosmer-Lemeshow Test

Tree Models

  • Entropy
  • Information Gain
  • Information Gain Rate
  • Gini Index ( CART)

SVM (Support Vector Machine)

  1. what is the point of using information gain rate rather than information gain? (eg. if we classify according to id number, we get a short tree but very little use)

Regression Models

Linear Regression

Robust Regression (less weight on extreme observations)

Huber Loss Function

Predictors

continuous predictor

discrete predictor

transformed predictor

factors (categorical predictor)-add (k-1) betas

  1. Generalized Linear Regression (GLM)

Factors in the model (add (n-1) betas)

Interactions

addictive model

linear predictor( x beta)

family of the model (distribution of response)

covariates

link function (monotonic function)

Logistic Regression (see: classification)

Poisson Regression

Over dispersion, quasi-poisson

Gamma Regression

Binomial Regression (quasi-binomial)

  1. Model Selection

3.0 Basic Model selection

  1. AIC
      • exhaustive search
      • Stepwise regression
  2. Cross Validation (PRESS)
      • leave out cross validation
      • K-fold cross validation

    Hat Matrix (H) leverage

    PRESS ( prediction error sum of squares)

3.1 High-Dimensional variable selection ( Multicollinearity)

Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)

Principal Component Analysis (PCA)

dimensional representation

SVD decomposition

Principal Component Regression

Lasso

ridge regression

elastic net

shrinkage

group Lasso

3.2 Overfitting

Model Selection Algorithm

Regularization - add penalty terms

  1. Non-parametric regression

    smoothing parameter

    bandwidth

    (residual) deviance

    degree of freedom, effective degrees of freedom

    4.1Local regression

    local linear regression (a version of local polynomial regression)

  2. target point x_0

  3. choose span

    3.weight points (default : tai-cube weight function)

  4. fit the regression line locally

    choosing lambda : cross-validation or AIC

4.2 Penalized Spline/ Smoothing Spline

4.3 Generalized Additive Models

Projection Pursuit Regression

ridge function

** Curse of Dimensionality

3.

Tree-based models (CART)

Regression Trees

  1. Greedy Algorithm

  2. pruned back

    Classification Tress

    cross-entropy

    CART

Random Forest

bagging- Bootstrap Aggregation

ensemble learning

Boosting Algorithm for Regression

sequence of weak models adapting to a mature model

Cross-validation folds

Support Vector Machine

margin

support vectors

slack parameters

“ Kernel Trick”

4.

Gradient Descent (another approach)

cost function

learning rate

Advanced gradient descent algo

DFGS

Stochastic gradient descent

L-BFGS

Conjugate gradient descent

  1. Classification

    binary logistic regression

    logit function, logistic function

    decision boundary

    Hosmer-Lemeshow Test

    Fit Logistic regression

    cost function + gradient descent

Multi-notional logistic regression

  1. Neural Network

    single hidden-layer back propagation network-single layer perception

    activation function

    sigmoid function

    hidden layer, basis, weights (cost function of gradient descent)

forward propagation and backward propagation algorithm

* Gradient Checking (use Numerical Method to calculate gradient and check)

背诵

  1. linear regreesion assumption

  2. linear regression formula

  3. AIC = -2*loglikelihood + 2p

  4. variance error of linear regression

  5. PRESS

  6. HAT

  7. VIF

  8. design matrix

2. Tree Based Models

3. Ensemble Learning

4. Neural Network and Deep Learning

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545; min-height: 14.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px '.PingFang SC'; color: #454545}
li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
span.Apple-tab-span {white-space:pre}
ol.ol1 {list-style-type: decimal}
ul.ul1 {list-style-type: hyphen}

Machine Learning

Supervised Learning -labelled data (“right answer” -regression, classification)

Unsupervised Learning -clustering

  1. Regression Model

Models

  1. Assumptions (at least, the distribution of Y)

  2. Assumption validations (residual analysis, eg)

  3. fit outside the training sample ( overfit,underfoot)

  4. use the model to make predictions

Diagnostic Plots

  1. residuals versus fitted values

  2. normal probability plot

  3. Plot of residuals versus time

Coefficient of Determination, R^2

adjusted R^2

Detecting influential observations

Cook’s Distance

“Arms” Graph

improvement : multiple models, robust regression, weighted least squares

Transformation-response

Box-Cox Transformation

Yeo-Johnson Transformation

Robust Regression (less weight on extreme observations)

Huber Loss Function

Predictors

continuous predictor

discrete predictor

transformed predictor

factors (categorical predictor)-add (k-1) betas

  1. Generalized Linear Regression (GLM)

Factors in the model (add (n-1) betas)

Interactions

addictive model

linear predictor( x beta)

family of the model (distribution of response)

covariates

link function (monotonic function)

Logistic Regression (see: classification)

Poisson Regression

Over dispersion, quasi-poisson

Gamma Regression

Binomial Regression (quasi-binomial)

  1. Model Selection

3.0 Basic Model selection

  1. AIC
      • exhaustive search
      • Stepwise regression
  2. Cross Validation (PRESS)
      • leave out cross validation
      • K-fold cross validation

    Hat Matrix (H) leverage

    PRESS ( prediction error sum of squares)

3.1 High-Dimensional variable selection ( Multicollinearity)

Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)

Principal Component Analysis (PCA)

dimensional representation

SVD decomposition

Principal Component Regression

Lasso

ridge regression

elastic net

shrinkage

group Lasso

3.2 Overfitting

Model Selection Algorithm

Regularization - add penalty terms

  1. Non-parametric regression

    smoothing parameter

    bandwidth

    (residual) deviance

    degree of freedom, effective degrees of freedom

    4.1Local regression

    local linear regression (a version of local polynomial regression)

  2. target point x_0

  3. choose span

    3.weight points (default : tai-cube weight function)

  4. fit the regression line locally

    choosing lambda : cross-validation or AIC

4.2 Penalized Spline/ Smoothing Spline

4.3 Generalized Additive Models

Projection Pursuit Regression

ridge function

** Curse of Dimensionality

3.

Tree-based models (CART)

Regression Trees

  1. Greedy Algorithm

  2. pruned back

    Classification Tress

    cross-entropy

    CART

Random Forest

bagging- Bootstrap Aggregation

ensemble learning

Boosting Algorithm for Regression

sequence of weak models adapting to a mature model

Cross-validation folds

Support Vector Machine

margin

support vectors

slack parameters

“ Kernel Trick”

4.

Gradient Descent (another approach)

cost function

learning rate

Advanced gradient descent algo

DFGS

Stochastic gradient descent

L-BFGS

Conjugate gradient descent

  1. Classification

    binary logistic regression

    logit function, logistic function

    decision boundary

    Hosmer-Lemeshow Test

    Fit Logistic regression

    cost function + gradient descent

Multi-notional logistic regression

  1. Neural Network

    single hidden-layer back propagation network-single layer perception

    activation function

    sigmoid function

    hidden layer, basis, weights (cost function of gradient descent)

forward propagation and backward propagation algorithm

* Gradient Checking (use Numerical Method to calculate gradient and check)

背诵

  1. linear regreesion assumption

  2. linear regression formula

  3. AIC = -2*loglikelihood + 2p

  4. variance error of linear regression

  5. PRESS

  6. HAT

  7. VIF

  8. design matrix

results matching ""

    No results matching ""