###Basic Questions
- What is Supervised Learning vs Unsupervised Learning? Supervise Learning is on Labelled Data ( We have both Response variable/output variable/dependent variable AND predictors/indepedent varibables/covariates/features )
- Describe a Model you have built.
General Model Building Questions
- Describe the Key Steps of model building:
- Assumptions (at least, the distribution of Y) - After data visualization maybe
- Assumption validations (residual analysis, eg) - Diagonistic Graphs, Make Transformations and Adjustments
- residuals versus fitted values
- normal probability plots
- Plots of residuals versus time
- Model Building
- outliers
- transformations
- kernels
- fit outside the training sample ( overfit, underfit problem, model selection)
- use the model to make predictions
- Describe some Diagnostic Plots
- Coefficient of Determination, R^2
adjusted R^2
Detecting influential observations
Cook’s Distance
“Arms” Graph
improvement : multiple models, robust regression, weighted least squares
Transformation-response
Box-Cox Transformation
Yeo-Johnson Transformation
1. Regression Models
generalized linear models( GLM)
logist
Multinotional Logistic Regression
Possion Regression
Binomial Regression
Linear Regression
- Derive the OSL/MLE for linear regression
Assumption for Linear Regressions:
t-test for slope:
$$
t = \frac{\hat{b}}{s_e} , df = n - p - 1
$$
- F-test for significance of current model:
$$
F = \frac{MSR}{MSE} = \frac{SSR\backslash p}{SSE\backslash (n-p-1)}, df = (p, n-p-1)
$$
- Breusch-Pagan test (for (conditional) heteroskesiticity) : chi-square test
$$
\chi^2 = n \times R_{resid}^2
$$
- Durbin-Waston Test (for autocorrelation)
$$ DW = \frac{\sum{t = 2}^T (\hat{\epsilon}_t - \hat{\epsilon}{t-1})^2}{\sum_{t = 1}^T \hat{\epsilon}_t^2}
$$
- Test multicollinearity:
- perfect case: compare rank
standard error for estimates: \sqrt{SSE/n-2}
Logistic Regression
logit function, logistic(sigmoid) function
Hosmer-Lemeshow Test
Tree Models
- Entropy
- Information Gain
- Information Gain Rate
- Gini Index ( CART)
SVM (Support Vector Machine)
- what is the point of using information gain rate rather than information gain? (eg. if we classify according to id number, we get a short tree but very little use)
Regression Models
Linear Regression
Robust Regression (less weight on extreme observations)
Huber Loss Function
Predictors
continuous predictor
discrete predictor
transformed predictor
factors (categorical predictor)-add (k-1) betas
- Generalized Linear Regression (GLM)
Factors in the model (add (n-1) betas)
Interactions
addictive model
linear predictor( x beta)
family of the model (distribution of response)
covariates
link function (monotonic function)
Logistic Regression (see: classification)
Poisson Regression
Over dispersion, quasi-poisson
Gamma Regression
Binomial Regression (quasi-binomial)
- Model Selection
3.0 Basic Model selection
- AIC
- exhaustive search
- Stepwise regression
- Cross Validation (PRESS)
- leave out cross validation
- K-fold cross validation
Hat Matrix (H) leverage
PRESS ( prediction error sum of squares)
3.1 High-Dimensional variable selection ( Multicollinearity)
Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)
Principal Component Analysis (PCA)
dimensional representation
SVD decomposition
Principal Component Regression
Lasso
ridge regression
elastic net
shrinkage
group Lasso
3.2 Overfitting
Model Selection Algorithm
Regularization - add penalty terms
Non-parametric regression
smoothing parameter
bandwidth
(residual) deviance
degree of freedom, effective degrees of freedom
4.1Local regression
local linear regression (a version of local polynomial regression)
target point x_0
choose span
3.weight points (default : tai-cube weight function)
fit the regression line locally
choosing lambda : cross-validation or AIC
4.2 Penalized Spline/ Smoothing Spline
4.3 Generalized Additive Models
Projection Pursuit Regression
ridge function
** Curse of Dimensionality
3.
Tree-based models (CART)
Regression Trees
Greedy Algorithm
pruned back
Classification Tress
cross-entropy
CART
Random Forest
bagging- Bootstrap Aggregation
ensemble learning
Boosting Algorithm for Regression
sequence of weak models adapting to a mature model
Cross-validation folds
Support Vector Machine
margin
support vectors
slack parameters
“ Kernel Trick”
4.
Gradient Descent (another approach)
cost function
learning rate
Advanced gradient descent algo
DFGS
Stochastic gradient descent
L-BFGS
Conjugate gradient descent
Classification
binary logistic regression
logit function, logistic function
decision boundary
Hosmer-Lemeshow Test
Fit Logistic regression
cost function + gradient descent
Multi-notional logistic regression
Neural Network
single hidden-layer back propagation network-single layer perception
activation function
sigmoid function
hidden layer, basis, weights (cost function of gradient descent)
forward propagation and backward propagation algorithm
* Gradient Checking (use Numerical Method to calculate gradient and check)
背诵
linear regreesion assumption
linear regression formula
AIC = -2*loglikelihood + 2p
variance error of linear regression
PRESS
HAT
VIF
design matrix
2. Tree Based Models
3. Ensemble Learning
4. Neural Network and Deep Learning
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545; min-height: 14.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px '.PingFang SC'; color: #454545}
li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
span.Apple-tab-span {white-space:pre}
ol.ol1 {list-style-type: decimal}
ul.ul1 {list-style-type: hyphen}
Machine Learning
Supervised Learning -labelled data (“right answer” -regression, classification)
Unsupervised Learning -clustering
- Regression Model
Models
Assumptions (at least, the distribution of Y)
Assumption validations (residual analysis, eg)
fit outside the training sample ( overfit,underfoot)
use the model to make predictions
Diagnostic Plots
residuals versus fitted values
normal probability plot
Plot of residuals versus time
Coefficient of Determination, R^2
adjusted R^2
Detecting influential observations
Cook’s Distance
“Arms” Graph
improvement : multiple models, robust regression, weighted least squares
Transformation-response
Box-Cox Transformation
Yeo-Johnson Transformation
Robust Regression (less weight on extreme observations)
Huber Loss Function
Predictors
continuous predictor
discrete predictor
transformed predictor
factors (categorical predictor)-add (k-1) betas
- Generalized Linear Regression (GLM)
Factors in the model (add (n-1) betas)
Interactions
addictive model
linear predictor( x beta)
family of the model (distribution of response)
covariates
link function (monotonic function)
Logistic Regression (see: classification)
Poisson Regression
Over dispersion, quasi-poisson
Gamma Regression
Binomial Regression (quasi-binomial)
- Model Selection
3.0 Basic Model selection
- AIC
- exhaustive search
- Stepwise regression
- Cross Validation (PRESS)
- leave out cross validation
- K-fold cross validation
Hat Matrix (H) leverage
PRESS ( prediction error sum of squares)
3.1 High-Dimensional variable selection ( Multicollinearity)
Variance of Inflaction Factor (VIF) (r^2 coefficient of determination)
Principal Component Analysis (PCA)
dimensional representation
SVD decomposition
Principal Component Regression
Lasso
ridge regression
elastic net
shrinkage
group Lasso
3.2 Overfitting
Model Selection Algorithm
Regularization - add penalty terms
Non-parametric regression
smoothing parameter
bandwidth
(residual) deviance
degree of freedom, effective degrees of freedom
4.1Local regression
local linear regression (a version of local polynomial regression)
target point x_0
choose span
3.weight points (default : tai-cube weight function)
fit the regression line locally
choosing lambda : cross-validation or AIC
4.2 Penalized Spline/ Smoothing Spline
4.3 Generalized Additive Models
Projection Pursuit Regression
ridge function
** Curse of Dimensionality
3.
Tree-based models (CART)
Regression Trees
Greedy Algorithm
pruned back
Classification Tress
cross-entropy
CART
Random Forest
bagging- Bootstrap Aggregation
ensemble learning
Boosting Algorithm for Regression
sequence of weak models adapting to a mature model
Cross-validation folds
Support Vector Machine
margin
support vectors
slack parameters
“ Kernel Trick”
4.
Gradient Descent (another approach)
cost function
learning rate
Advanced gradient descent algo
DFGS
Stochastic gradient descent
L-BFGS
Conjugate gradient descent
Classification
binary logistic regression
logit function, logistic function
decision boundary
Hosmer-Lemeshow Test
Fit Logistic regression
cost function + gradient descent
Multi-notional logistic regression
Neural Network
single hidden-layer back propagation network-single layer perception
activation function
sigmoid function
hidden layer, basis, weights (cost function of gradient descent)
forward propagation and backward propagation algorithm
* Gradient Checking (use Numerical Method to calculate gradient and check)
背诵
linear regreesion assumption
linear regression formula
AIC = -2*loglikelihood + 2p
variance error of linear regression
PRESS
HAT
VIF
design matrix