Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Saturday, February 23, 2019

How to import data into Google Colab Jupyter Noteook

Accesing data is one of the first step that we need when performing any data analysis. In this tutorial, we will see two ways of loading data into the google colab environment.

  • Uploading csv from local machine and loading into colab
  • Loading data from google drive to colab

  • Uploading CSV from local machine using IMPORT functionality.

  • Load import files library from google colab
  • upload file using the upload button control

  • Running below commands will allow us to upload data files into the colab environment. Once the Choose Files button is visible, after executing the below listed python commands, we can easily upload files from local directory.
    from google.colab import files
    uploaded = files.upload()
    Saving DOLPHIN.csv to DOLPHIN.csv

    To view the uploaded files

    Below command allows us to verify if the file is uploaded correctly.
    for fn in uploaded.keys():
      print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
    User uploaded file "DOLPHIN.csv" with length 117269 bytes

    Saturday, September 8, 2018

    Getting started with google laboratory for running deep learning applications

    What is Google Colab:

    We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory - Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all.

    Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.

    Saturday, December 2, 2017

    Getting Started with R

    In this post we get familiar with R Studio and basic syntax of R programming language

    Rstudio OverView

    we have 4 panes
    1) script pane - to write and save the programing script
    2) Console pane - where all the code will get executed
    3) Environment/history pane - displays all the variables created,functions
    used with in the current session
    4) Helper pane - contains multiple tabs to install/display pacakges,
    view visualization plots,
    locate files within the workspace

    In [1]:

    Saturday, November 4, 2017

    information retrieval document search using vector space model in R


    In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.

    Problem statement:

    The problem statement explained above is represented as in below image.
    Document retrieval system

    Friday, March 18, 2016

    apply lapply rapply sapply functions in R

    As part of Data Science with R, this is third tutorial after basic data types,control structures in r.

    One of the issues with for loop is its memory consumption and its slowness in executing a repetitive task at hand. Often dealing with large data and iterating it, for loop is not advised. R provides many few alternatives to be applied on vectors for looping operations. In this section, we deal with apply function and its variants:

    Saturday, February 27, 2016

    Control Structures Loops in R

    As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming  may takeoff immediately.
    Please find the online R editor at the end of the post so that you can execute the code on the page itself.
    In this section we learn about control structures loops used in R. Control strcutures in R contains conditionals, loop statements like any other programming languages.

    Principal Component Analysis using R

    Curse of Dimensionality:
    One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as the curse of dimensionality. In general, adding more data points or decreasing the feature space, also known as dimensionality reduction, often reduces the effects of the curse of dimensionality.
    In this blog, we will discuss about principal component analysis, a popular dimensionality reduction technique. PCA is a useful statistical method that has found application in a variety of fields and is a common technique for finding patterns in data of high dimension.

    Principal component analysis:

    Tuesday, February 16, 2016

    Basic Data Types in R

    As part of tutorial series on Data Science with R from Data Perspective, this first tutorial introduces the very basics of R programming language about basic data types in R.

    What we learn:
    After the end of the chapter, you are provided with R console so that you can practice what you have learnt in this chapter.

    Friday, December 25, 2015

    Data Science with R

    As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and R.

    "A data scientist is simply someone who is highly adept at studying large amounts of often unorganized/undigested data"

    Wednesday, November 18, 2015

    Item Based Collaborative Filtering Recommender Systems in R

    In the series of implementing Recommendation engines, in my previous blog about recommendation system in R, I have explained about implementing user based collaborative filtering approach using R. In this post, I will be explaining about basic implementation of Item based collaborative filtering recommender systems in r.