Data Perspective: Statistics

Showing posts with label Statistics. Show all posts

Wednesday, October 7, 2015

Introduction to Logistic Regression with R

In my previous blog I have explained about linear regression. In today’s post I will explain about logistic regression.
Consider a scenario where we need to predict a medical condition of a patient (HBP) ,HAVE HIGH BP or NO HIGH BP, based on some observed symptoms – Age, weight, Issmoking, Systolic value, Diastolic value, RACE, etc.. In this scenario we have to build a model which takes the above mentioned symptoms as input values and HBP as response variable. Note that the response variable (HBP) is a value among a fixed set of classes, HAVE HIGH BP or NO HIGH BP.

Logistic regression – a classification problem, not a prediction problem:

In my previous blog I told that we use linear regression for scenarios which involves prediction. But there is a check; the regression analysis cannot be applied in scenarios where the response variable is not continuous. In our case the response variable is not a continuous variable but a value among a fixed set of classes. We call such scenarios as Classification problem rather than prediction problem. In such scenarios where the response variables are more of qualitative nature rather than continuous nature, we have to apply more suitable models namely logistic regression for classification.

Assessing Model Accuracy - Part 2

In my last post, I have explained about MSE, today I will explain the variance & bias trade-off, Precision recall trade-off while assessing the model accuracy.

What is Variance and bias of a statistical learning Method?
Variance refers to the amount by which the estimated output (f) would change if we estimated it (f) using a different training dataset. Since the training data is used to fit the statistical learning method, different training sets will result in different outputs (f).

Assessing Model Accuracy - Part1

Recently, I have started reading a book "Introduction to statistical Learning", which had good introduction for model accuracy assessing. This post contains excerpts of the chapter:

Often we take different statistical approaches to build a solution for a data analytical problem. Why is it necessary to introduce so many different approaches, rather than a single best method? The answer is: in Statistics no single method dominates all other methods over all possible datasets. One statistical method may work well with a specific dataset and some other method may work better on a similar but different dataset. So it is important to decide for a particular dataset which method produces best results.

Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Wednesday, October 7, 2015

Introduction to Logistic Regression with R

Logistic regression – a classification problem, not a prediction problem:

Thursday, July 31, 2014

Assessing Model Accuracy - Part 2

Saturday, June 21, 2014

Assessing Model Accuracy - Part1