Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Sunday, October 5, 2014

Regression Analysis using R

What is a Prediction Problem?
A business problem which involves predicting future events by extracting patterns in the historical data. Prediction problems are solved using Statistical techniques, mathematical models or machine learning techniques.
For example: Forecasting stock price for the next week, predicting which football team wins the world cup, etc.

Thursday, July 31, 2014

Assessing Model Accuracy - Part 2

In my last post, I have explained about MSE, today I will explain the variance & bias trade-off, Precision recall trade-off while assessing the model accuracy.

What is Variance and bias of a statistical learning Method?
Variance refers to the amount by which the estimated output (f) would change if we estimated it (f) using a different training dataset. Since the training data is used to fit the statistical learning method, different training sets will result in different outputs (f).

Saturday, June 21, 2014

Assessing Model Accuracy - Part1

Recently, I have started reading a book "Introduction to statistical Learning", which had good introduction for model accuracy assessing. This post contains excerpts of the chapter:

Often we take different statistical approaches to build a solution for a data analytical problem. Why is it necessary to introduce so many different approaches, rather than a single best method? The answer is: in Statistics no single method dominates all other methods over all possible datasets. One statistical method may work well with a specific dataset and some other method may work better on a similar but different dataset. So it is important to decide for a particular dataset which method produces best results.

Sunday, May 25, 2014

Basic recommendation engine using R

In our day to day life, we come across a large number of Recommendation engines like Facebook Recommendation Engine for Friends’ suggestions, and suggestions of similar Like Pages, Youtube recommendation engine suggesting videos similar to our previous searches/preferences. In today’s blog post I will explain how to build a basic recommender System.

Thursday, April 17, 2014

Time Series Analysis using R - forecast package

In today’s blog post, we shall look into time series analysis using R package – forecast. Objective of the post will be explaining the different methods available in forecast package which can be applied while dealing with time series analysis/forecasting.

Thursday, March 20, 2014

Build Web applications using Shiny R

Ever since I’ve started working on R , I always wondered how I can present
the results of my statistical models as web applications. After doing some
research over the internet I’ve come across – ShinyR – a new package
from RStudio which can be used to develop interactive web applications with R.
Before going into how to build web apps using R, let me give you some overview
about ShinyR.

Monday, March 3, 2014

Exploratory data analysis techniques

In my previous blog post I have explained the steps needed to solve a data analysis problem. Going further, I will be discussing in-detail each and every step of Data Analysis. In this post, we shall discuss about exploratory Analysis.

Monday, February 3, 2014

Data Analysis Steps

After going through the overview of tools & technologies needed to become a Data scientist in my previous blog post, in this post, we shall understand how to tackle a data analysis problem.
Any data analysis project starts with identifying a business problem where historical data exists. A business problem can be anything which can include prediction problems, analyzing customer behavior, identifying new patterns from past events, building recommendation engines etc.

Tuesday, January 7, 2014

Data Analysis Tools

As mentioned in my previous post , in this post I will be listing out the tools, blogs and forums, online courses that I have gathered over the past one year, which I felt necessary in my journey, which will be helpful to my fellow data science aspirants.