Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Saturday, September 8, 2018

Getting started with google laboratory for running deep learning applications

What is Google Colab:


We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory - Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all.

Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.

Setting up colab:


Go to google drive → new item → More → colaboratory


This opens up a python Jupyter notebook in browser.


By default, the Jupyter notebook runs on Python 2.7 version and CPU processor. We may change the python version to Python 3.6 and processing capability to GPU by changing the settings as shown below:

Go to Runtime → Change runtime type


This opens up a Notebook settings pop-up where we can change Runtime Type to Python 3.6 and processing Hardware to GPU.


Bingo, your python environment with the processing power of GPU is ready use.

Important things to remember:
  • The supported browsers are Chrome and Firefox
  • Currently only Python is supported
  • We can you use upto 12 hours of processing time in one go
Let’s check if our newly created Jupyter notebook works perfectly. Run below commands and see if we are getting expected results.


By default most frequently used python libraries such as Numpy, Pandas, scipy, Sklearn, Matplotlib etc are pre-installed when we create a notebook. Below we can see plotting

Running Machine Learning example:


Below Python Notebook code shows us an example of multi layer neural network using python and sklearn library.

Full code implementation:



In the next post we see how to import data into Colab environment

Saturday, December 2, 2017

Getting Started with R

In this post we get familiar with R Studio and basic syntax of R programming language

Rstudio OverView


we have 4 panes
1) script pane - to write and save the programing script
2) Console pane - where all the code will get executed
3) Environment/history pane - displays all the variables created,functions
used with in the current session
4) Helper pane - contains multiple tabs to install/display pacakges,
view visualization plots,
locate files within the workspace

In [1]:
help(mean)

Saturday, November 4, 2017

information retrieval document search using vector space model in R


Introduction:

In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.

Problem statement:

The problem statement explained above is represented as in below image.
Document retrieval system


Friday, March 18, 2016

apply lapply rapply sapply functions in R

As part of Data Science with R, this is third tutorial after basic data types,control structures in r.

One of the issues with for loop is its memory consumption and its slowness in executing a repetitive task at hand. Often dealing with large data and iterating it, for loop is not advised. R provides many few alternatives to be applied on vectors for looping operations. In this section, we deal with apply function and its variants:

Saturday, February 27, 2016

Control Structures Loops in R

As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming  may takeoff immediately.
Please find the online R editor at the end of the post so that you can execute the code on the page itself.
In this section we learn about control structures loops used in R. Control strcutures in R contains conditionals, loop statements like any other programming languages.

Principal Component Analysis using R

Curse of Dimensionality:
One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as the curse of dimensionality. In general, adding more data points or decreasing the feature space, also known as dimensionality reduction, often reduces the effects of the curse of dimensionality.
In this blog, we will discuss about principal component analysis, a popular dimensionality reduction technique. PCA is a useful statistical method that has found application in a variety of fields and is a common technique for finding patterns in data of high dimension.


Principal component analysis:


Tuesday, February 16, 2016

Basic Data Types in R

As part of tutorial series on Data Science with R from Data Perspective, this first tutorial introduces the very basics of R programming language about basic data types in R.

What we learn:
After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.



Friday, December 25, 2015

Data Science with R

As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and R.

"A data scientist is simply someone who is highly adept at studying large amounts of often unorganized/undigested data"


Wednesday, November 18, 2015

Item Based Collaborative Filtering Recommender Systems in R

In the series of implementing Recommendation engines, in my previous blog about recommendation system in R, I have explained about implementing user based collaborative filtering approach using R. In this post, I will be explaining about basic implementation of Item based collaborative filtering recommender systems in r.
Intuition:


Monday, October 19, 2015

Data Mining Standard Process across Organizations

Recently I have come across a term, CRISP-DM - a data mining standard. Though this process is not a new one but I felt every analyst should know about commonly used Industry wide process. In this post I will explain about different phases involved in creating a data mining solution.

CRISP-DM, an acronym for Cross Industry Standard Process for Data Mining, is a data mining process model that includes commonly used approaches that data analytics Organizations use to tackle business problems related to Data mining. Polls conducted at one and the same website (KDNuggests) in 2002, 2004, 2007 and 2014 show that it was the leading methodology used by industry data miners who decided to respond to the survey.