Powered by GitBook

###Basic Questions

What is Supervised Learning vs Unsupervised Learning? Supervise Learning is on Labelled Data ( We have both Response variable/output variable/dependent variable AND predictors/indepedent varibables/covariates/features )
Describe a Model you have built.

General Model Building Questions

Describe the Key Steps of model building:
1. Assumptions (at least, the distribution of Y) - After data visualization maybe
2. Assumption validations (residual analysis, eg) - Diagonistic Graphs, Make Transformations and Adjustments
  1. residuals versus fitted values
  2. normal probability plots
  3. Plots of residuals versus time
3. Model Building
  1. outliers
  2. transformations
  3. kernels
4. fit outside the training sample ( overfit, underfit problem, model selection)
5. use the model to make predictions
Describe some Diagnostic Plots
Coefficient of Determination, R^2

adjusted R^2

Detecting influential observations

Cook’s Distance

“Arms” Graph

improvement : multiple models, robust regression, weighted least squares

Transformation-response

Box-Cox Transformation

Yeo-Johnson Transformation

1. Regression Models

generalized linear models( GLM)

logist

Multinotional Logistic Regression

Possion Regression

Binomial Regression

Linear Regression

Derive the OSL/MLE for linear regression

Assumption for Linear Regressions:

t-test for slope:

  $$
  t = \frac{\hat{b}}{s_e} , df = n - p - 1
  $$

F-test for significance of current model:

  $$
  F = \frac{MSR}{MSE} = \frac{SSR\backslash p}{SSE\backslash (n-p-1)}, df = (p, n-p-1)
  $$

Breusch-Pagan test (for (conditional) heteroskesiticity) : chi-square test

  $$
  \chi^2 = n \times R_{resid}^2
  $$

Durbin-Waston Test (for autocorrelation)

$$ DW = \frac{\sum{t = 2}^T (\hat{\epsilon}_t - \hat{\epsilon}{t-1})^2}{\sum_{t = 1}^T \hat{\epsilon}_t^2}

$$

Test multicollinearity:
1. perfect case: compare rank

standard error for estimates: \sqrt{SSE/n-2}

Logistic Regression

logit function, logistic(sigmoid) function

Hosmer-Lemeshow Test

Tree Models

Entropy
Information Gain
Information Gain Rate
Gini Index ( CART)

SVM (Support Vector Machine)

what is the point of using information gain rate rather than information gain? (eg. if we classify according to id number, we get a short tree but very little use)

Regression Models

Linear Regression

Robust Regression (less weight on extreme observations)

Huber Loss Function

Predictors

continuous predictor

discrete predictor

transformed predictor

factors (categorical predictor)-add (k-1) betas

Generalized Linear Regression (GLM)

Factors in the model (add (n-1) betas)

Interactions

addictive model

linear predictor( x beta)

family of the model (distribution of response)

covariates

link function (monotonic function)

Logistic Regression (see: classification)

Poisson Regression

Over dispersion, quasi-poisson

Gamma Regression

Binomial Regression (quasi-binomial)

Model Selection

3.0 Basic Model selection

AIC
1. - exhaustive search
  - Stepwise regression
Cross Validation (PRESS)
1. - leave out cross validation
  - K-fold cross validation
Hat Matrix (H) leverage

PRESS ( prediction error sum of squares)

3.1 High-Dimensional variable selection ( Multicollinearity)

Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)

Principal Component Analysis (PCA)

dimensional representation

SVD decomposition

Principal Component Regression

Lasso

ridge regression

elastic net

shrinkage

group Lasso

3.2 Overfitting

Model Selection Algorithm

Regularization - add penalty terms

Non-parametric regression

smoothing parameter

bandwidth

(residual) deviance

degree of freedom, effective degrees of freedom

4.1Local regression

local linear regression (a version of local polynomial regression)
target point x_0
choose span

3.weight points (default : tai-cube weight function)
fit the regression line locally

choosing lambda : cross-validation or AIC

4.2 Penalized Spline/ Smoothing Spline

4.3 Generalized Additive Models

Projection Pursuit Regression

ridge function

** Curse of Dimensionality

3.

Tree-based models (CART)

Regression Trees

Greedy Algorithm
pruned back

Classification Tress

cross-entropy

CART

Random Forest

bagging- Bootstrap Aggregation

ensemble learning

Boosting Algorithm for Regression

sequence of weak models adapting to a mature model

Cross-validation folds

Support Vector Machine

margin

support vectors

slack parameters

“ Kernel Trick”

4.

Gradient Descent (another approach)

cost function

learning rate

Advanced gradient descent algo

DFGS

Stochastic gradient descent

L-BFGS

Conjugate gradient descent

Classification

binary logistic regression

logit function, logistic function

decision boundary

Hosmer-Lemeshow Test

Fit Logistic regression

cost function + gradient descent

Multi-notional logistic regression

Neural Network

single hidden-layer back propagation network-single layer perception

activation function

sigmoid function

hidden layer, basis, weights (cost function of gradient descent)

forward propagation and backward propagation algorithm

* Gradient Checking (use Numerical Method to calculate gradient and check)

背诵

linear regreesion assumption
linear regression formula
AIC = -2*loglikelihood + 2p
variance error of linear regression
PRESS
HAT
VIF
design matrix

2. Tree Based Models

3. Ensemble Learning

4. Neural Network and Deep Learning

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545; min-height: 14.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px '.PingFang SC'; color: #454545}
li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
span.Apple-tab-span {white-space:pre}
ol.ol1 {list-style-type: decimal}
ul.ul1 {list-style-type: hyphen}

Machine Learning

Supervised Learning -labelled data (“right answer” -regression, classification)

Unsupervised Learning -clustering

Regression Model

Models

Assumptions (at least, the distribution of Y)
Assumption validations (residual analysis, eg)
fit outside the training sample ( overfit,underfoot)
use the model to make predictions

Diagnostic Plots

residuals versus fitted values
normal probability plot
Plot of residuals versus time

Coefficient of Determination, R^2

adjusted R^2

Detecting influential observations

Cook’s Distance

“Arms” Graph

improvement : multiple models, robust regression, weighted least squares

Transformation-response

Box-Cox Transformation

Yeo-Johnson Transformation

Robust Regression (less weight on extreme observations)

Huber Loss Function

Predictors

continuous predictor

discrete predictor

transformed predictor

factors (categorical predictor)-add (k-1) betas

Generalized Linear Regression (GLM)

Factors in the model (add (n-1) betas)

Interactions

addictive model

linear predictor( x beta)

family of the model (distribution of response)

covariates

link function (monotonic function)

Logistic Regression (see: classification)

Poisson Regression

Over dispersion, quasi-poisson

Gamma Regression

Binomial Regression (quasi-binomial)

Model Selection

3.0 Basic Model selection

AIC
- - exhaustive search
  - Stepwise regression
Cross Validation (PRESS)
- - leave out cross validation
  - K-fold cross validation
Hat Matrix (H) leverage

PRESS ( prediction error sum of squares)

3.1 High-Dimensional variable selection ( Multicollinearity)

Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)

Principal Component Analysis (PCA)

dimensional representation

SVD decomposition

Principal Component Regression

Lasso

ridge regression

elastic net

shrinkage

group Lasso

3.2 Overfitting

Model Selection Algorithm

Regularization - add penalty terms

Non-parametric regression

smoothing parameter

bandwidth

(residual) deviance

degree of freedom, effective degrees of freedom

4.1Local regression

local linear regression (a version of local polynomial regression)
target point x_0
choose span

3.weight points (default : tai-cube weight function)
fit the regression line locally

choosing lambda : cross-validation or AIC

4.2 Penalized Spline/ Smoothing Spline

4.3 Generalized Additive Models

Projection Pursuit Regression

ridge function

** Curse of Dimensionality

3.

Tree-based models (CART)

Regression Trees

Greedy Algorithm
pruned back

Classification Tress

cross-entropy

CART

Random Forest

bagging- Bootstrap Aggregation

ensemble learning

Boosting Algorithm for Regression

sequence of weak models adapting to a mature model

Cross-validation folds

Support Vector Machine

margin

support vectors

slack parameters

“ Kernel Trick”

4.

Gradient Descent (another approach)

cost function

learning rate

Advanced gradient descent algo

DFGS

Stochastic gradient descent

L-BFGS

Conjugate gradient descent

Classification

binary logistic regression

logit function, logistic function

decision boundary

Hosmer-Lemeshow Test

Fit Logistic regression

cost function + gradient descent

Multi-notional logistic regression

Neural Network

single hidden-layer back propagation network-single layer perception

activation function

sigmoid function

hidden layer, basis, weights (cost function of gradient descent)

forward propagation and backward propagation algorithm

* Gradient Checking (use Numerical Method to calculate gradient and check)

背诵

linear regreesion assumption
linear regression formula
AIC = -2*loglikelihood + 2p
variance error of linear regression
PRESS
HAT
VIF
design matrix

results matching ""

No results matching ""