Prediction Model- Regression

The post is a preview for week 2 educational data mining class. Regression analysis itself covers more things and requires more effort to learn.

Usage of Prediction Models

  • predict a single variable in the dataset using other variables.
  • predict the future.
  • make inferences from the present

component of regression

  • the dataset used to build a regression model: each value of the predicted variable in the dataset is called a training label, associated with the traning lable is called a set of features. with out the training lables, the model still exists.
  • regressors(predictors) + predicted variable


  • transfrom
    • unitization
    • using others transform functions, e.g., sqrt(x)

caveat (warning)

a regression model: y = 4 + 2*x - 0.1x^2

In this case, it does not mean that x^2 is negatively correlated with y. We need to think it in a big picture, in this case, it means that when including x in the model, the relationship between x^2 and y becames negative.

When the regressors in the model are not independent, we need to be more careful.

regression trees

  • non-linear (RepTree) : if x> 4, y=5, else if y=2
  • linear (M5 prime, M5’) : if x>4, y=2A+B, else if y=3A+3B
Written on January 28, 2017