Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Saturday, February 23, 2019

How to import data into Google Colab Jupyter Noteook

Accesing data is one of the first step that we need when performing any data analysis. In this tutorial, we will see two ways of loading data into the google colab environment.


  • Uploading csv from local machine and loading into colab
  • Loading data from google drive to colab

  • Uploading CSV from local machine using IMPORT functionality.



  • Load import files library from google colab
  • upload file using the upload button control

  • Running below commands will allow us to upload data files into the colab environment. Once the Choose Files button is visible, after executing the below listed python commands, we can easily upload files from local directory.
    from google.colab import files
    uploaded = files.upload()
    
    Saving DOLPHIN.csv to DOLPHIN.csv

    To view the uploaded files

    Below command allows us to verify if the file is uploaded correctly.
    for fn in uploaded.keys():
      print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
    
    User uploaded file "DOLPHIN.csv" with length 117269 bytes



    Reading the uploaded from into pandas dataframe and displaying results

    After the data file is uploaded to the colab, we can use pandas functions to load data into python environment and continue our further analysis.
    import pandas as pd
    import io
    df = pd.read_csv(io.StringIO(uploaded['DOLPHIN.csv'].decode('utf-8')))
    
    print(df.head(2))
    
    Load data from google drive:

    Sometimes we may require to load data from google drive. Below commands will be useful in reading data from google drive. Here we assume that the data file to be loaded into python environemnt is already uploaded to Google Drive.



  • Step 1, we need to mount the google drive
  • Step 2, After mounting we need to provide authorization
  • Step 3, we can view the current list of files available at the mounted location
  • Step 4, Load data using pandas read_csv function

  • Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code
    Enter your authorization code:
    ··········
    Mounted at /content/gdrive


    Note: My data files are located at MY DRIVE\COLAB NOTEBOOKS folder of my google drive account. Please change the code accordingly
    After clicking the link and entering the authorization code, you can access your drive as follows:

    !ls -la /content/gdrive/My\ Drive/Colab\ Notebooks/
    
    total 230
    -rw------- 1 root root 117269 Feb 23 06:43 DOLPHINOFFALLN.csv
    -rw------- 1 root root 12104 Feb 23 06:46 ImportDatatoColab.ipynb
    -rw------- 1 root root 8935 Sep 8 17:23 'Running first neural network model on google colaboratory'
    -rw------- 1 root root 13498 Aug 25 17:19 SettingupDrive_GSPGC.ipynb
    -rw------- 1 root root 81691 Nov 26 06:47 'Upload data to colab from google drive'

    df2 = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/DOLPHINOFFALLN.csv')
    
    print(df2.head(2))
    
    
    
    TIP: We can Install new libraries in python environment inline using below command
    Note that appending ! before pip command
    !pip install matplotlib

    Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (3.0.2)
    Requirement already satisfied: numpy>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.14.6)
    Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (0.10.0)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.3.1)
    Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.0.1)
    Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.5.3)
    Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10->matplotlib) (1.11.0)
    Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib) (40.8.0)


    References:



  • https://www.kdnuggets.com/2019/01/more-google-colab-environment-management-tips.html
  • https://www.kdnuggets.com/2018/02/essential-google-colaboratory-tips-tricks.html

  • Full code implementation:

    3 comments:

    1. I like the valuable information you provide in your articles. I will bookmark your blog and check again here frequently. I am quite sure I’ll learn many new stuff right here! Best of luck for the next! sem ppc

      ReplyDelete
    2. I don’t even know how I ended up here, but I thought this post was great. I don't know who you are but certainly you are going to a famous blogger if you are not already ;) Cheers! business loan singapore

      ReplyDelete
    3. I was suggested this website by my cousin. I am not sure whether this post is written by him as nobody else know such detailed about my problem. You are incredible! Thanks! online marketing campaign

      ReplyDelete