Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Friday, March 18, 2016

apply lapply rapply sapply functions in R

As part of Data Science with R, this is third tutorial after basic data types,control structures in r.

One of the issues with for loop is its memory consumption and its slowness in executing a repetitive task at hand. Often dealing with large data and iterating it, for loop is not advised. R provides many few alternatives to be applied on vectors for looping operations. In this section, we deal with apply function and its variants:

Saturday, February 27, 2016

Control Structures Loops in R

As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming  may takeoff immediately.
Please find the online R editor at the end of the post so that you can execute the code on the page itself.
In this section we learn about control structures loops used in R. Control strcutures in R contains conditionals, loop statements like any other programming languages.

Principal Component Analysis using R

Curse of Dimensionality:
One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as the curse of dimensionality. In general, adding more data points or decreasing the feature space, also known as dimensionality reduction, often reduces the effects of the curse of dimensionality.
In this blog, we will discuss about principal component analysis, a popular dimensionality reduction technique. PCA is a useful statistical method that has found application in a variety of fields and is a common technique for finding patterns in data of high dimension.


Principal component analysis:


Tuesday, February 16, 2016

Basic Data Types in R

As part of tutorial series on Data Science with R from Data Perspective, this first tutorial introduces the very basics of R programming language about basic data types in R.

What we learn:
After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.