Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Tuesday, December 31, 2013

Data Scientist. The Path I Chose

As we all are marching into the New Year, I would like to post about my plans to become a Data Scientist, my 2014 resolution at Professional front. The term Data Science was first introduced to me a year ago same time. Since then I have started researching and gathering necessary information and decided to become Data Scientist. After one year I just wanted to look back myself to understand where I stand now & still what needs to be done.
Power of Data – possible Use Cases:

Tuesday, December 17, 2013

Cluster Analysis using R

In this post, I will explain you about Cluster Analysis, The process of grouping objects/individuals together in such a way that objects/individuals in one group are more similar than objects/individuals in other groups.
 For example, from a ticket booking engine database identifying clients with similar booking activities and group them together (called Clusters). Later these identified clusters can be targeted for business improvement by issuing special offers, etc.
Cluster Analysis falls into Unsupervised Learning algorithms, where in Data to be analyzed will be provided to a Cluster analysis algorithm to identify hidden patterns within as shown in the figure below.

Thursday, October 24, 2013

Fetch Twitter data using R


This short post will explain how you can fetch twitter data using  twitteR & StreamR packages available in R. In order to connect to twitter API, we need to undergo an authentication process known as OAuth explained in my previous post.
Twitter data can be fetched from twitter in two ways: a) Rest API b) Streaming Api.
In today's blog post we shall go through both using Rest API & Stream API.


Sunday, October 6, 2013

Topic Modeling in R

As a part of Twitter Data Analysis, So far I have completed Movie review using R & Document Classification using R. Today we will be dealing with discovering topics in Tweets, i.e. to mine the tweets data to discover underlying topics– approach known as Topic Modeling.
What is Topic Modeling?

Tuesday, August 20, 2013

Sentiment Analysis using R

September 23, 2013


Today I will explain you how to create a basic Movie review engine based on the tweets by people using R.
The implementation of the Review Engine will be as follows:
  •          Gets Tweets from Twitter
  •          Clean the data
  •          Create a Word Cloud
  •          Create a data dictionary
  •          Score each tweet.

Thursday, July 25, 2013

Document Classification using R

September 23, 2013


Recently I have developed interest in analyzing data to find trends, to predict the future events etc. & started working on few POCS on Data Analytics such as Predictive analysis, text mining.  I’m putting my next blog on Data Mining- more specifically document classification using R Programming language, one of the powerful languages used for Statistical Analysis.
What is Document classification?