Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Saturday, February 23, 2019

How to import data into Google Colab Jupyter Noteook

Accesing data is one of the first step that we need when performing any data analysis. In this tutorial, we will see two ways of loading data into the google colab environment.

Uploading csv from local machine and loading into colab

Loading data from google drive to colab

Uploading CSV from local machine using IMPORT functionality.

Load import files library from google colab

upload file using the upload button control

Running below commands will allow us to upload data files into the colab environment. Once the Choose Files button is visible, after executing the below listed python commands, we can easily upload files from local directory.

from google.colab import files
uploaded = files.upload()

Saving DOLPHIN.csv to DOLPHIN.csv

To view the uploaded files

Below command allows us to verify if the file is uploaded correctly.

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

User uploaded file "DOLPHIN.csv" with length 117269 bytes

Getting started with google laboratory for running deep learning applications

What is Google Colab:

We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory - Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all.

Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.

Getting Started with R

In this post we get familiar with R Studio and basic syntax of R programming language

Rstudio OverView

we have 4 panes
1) script pane - to write and save the programing script
2) Console pane - where all the code will get executed
3) Environment/history pane - displays all the variables created,functions
used with in the current session
4) Helper pane - contains multiple tabs to install/display pacakges,
view visualization plots,
locate files within the workspace

In [1]:

help(mean)

information retrieval document search using vector space model in R

Introduction:

In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.

Problem statement:

The problem statement explained above is represented as in below image.

Document retrieval system

apply lapply rapply sapply functions in R

As part of Data Science with R, this is third tutorial after basic data types,control structures in r.

One of the issues with for loop is its memory consumption and its slowness in executing a repetitive task at hand. Often dealing with large data and iterating it, for loop is not advised. R provides many few alternatives to be applied on vectors for looping operations. In this section, we deal with apply function and its variants:

Control Structures Loops in R

As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming may takeoff immediately.
Please find the online R editor at the end of the post so that you can execute the code on the page itself.
In this section we learn about control structures loops used in R. Control strcutures in R contains conditionals, loop statements like any other programming languages.

Principal Component Analysis using R

Curse of Dimensionality:
One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as the curse of dimensionality. In general, adding more data points or decreasing the feature space, also known as dimensionality reduction, often reduces the effects of the curse of dimensionality.
In this blog, we will discuss about principal component analysis, a popular dimensionality reduction technique. PCA is a useful statistical method that has found application in a variety of fields and is a common technique for finding patterns in data of high dimension.

Principal component analysis:

Basic Data Types in R

As part of tutorial series on Data Science with R from Data Perspective, this first tutorial introduces the very basics of R programming language about basic data types in R.

What we learn:
After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.

Data Science with R

As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and R.

"A data scientist is simply someone who is highly adept at studying large amounts of often unorganized/undigested data"

Item Based Collaborative Filtering Recommender Systems in R

In the series of implementing Recommendation engines, in my previous blog about recommendation system in R, I have explained about implementing user based collaborative filtering approach using R. In this post, I will be explaining about basic implementation of Item based collaborative filtering recommender systems in r.
Intuition:

Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Saturday, February 23, 2019

How to import data into Google Colab Jupyter Noteook

Saturday, September 8, 2018

Getting started with google laboratory for running deep learning applications

What is Google Colab:

Saturday, December 2, 2017

Getting Started with R

Rstudio OverView

Saturday, November 4, 2017

information retrieval document search using vector space model in R

Introduction:

Problem statement:

Friday, March 18, 2016

apply lapply rapply sapply functions in R

Saturday, February 27, 2016

Control Structures Loops in R

Principal Component Analysis using R

Tuesday, February 16, 2016

Basic Data Types in R

Friday, December 25, 2015

Data Science with R

Wednesday, November 18, 2015

Item Based Collaborative Filtering Recommender Systems in R