Blog posts on Data Science, Machine Learning, Data Mining, Artificial Intelligence, Spark Machine Learning

Tuesday, January 7, 2014

Data Analysis Tools

As mentioned in my previous post , in this post I will be listing out the tools, blogs and forums, online courses that I have gathered over the past one year, which I felt necessary in my journey, which will be helpful to my fellow data science aspirants.

 Skillset Required:
  •  Knowledge in Statistics – Exploratory analysis, doing initial analysis of the data & understanding the data to decide what techniques needs to be applied, which I feel is a must know subject. 
  • Mathematics – basics of calculus, algebra etc. for mathematical formulation of the problem statement.
  •  Understanding Machine learning algorithms for predictive modeling, recommendation engines, classification models, cluster analysis, social network analysis.
  • Data mining skills like data cleaning/Data Munging skills, apply Machine learning techniques on the data. 
  •  Visualization skills to display the results, to understand the results during building data modeling.
Tools required: 
Programming Languages: Proficiency in any two of the below mentioned languages would be advisable:
  •  R 
  • Python 
  • Java – comes in handy when we work on Hadoop 
  • C,C++
Tools required: Since I’m using Open source tools, I will be confined to them:
  • R-Studio
  • NLTK toolkit 
  • Rapid Miner
  • Weka
Important Point: 
Most of the machine learning algorithms has been already implemented as packages in the above languages/tools . We need to just download and make use of them.
Big data Tools: 
  • Hadoop setup from Cloudera/Hortonworks
  • Mongodb- NoSQL DB
Visualization tools: 
Though I have not explored much in this area, but till day I’m happy with R packages for visualizations.
  • Data exploration in R/Python 
Few Books I have referred: 
  • The Elements of Statistical Learning - 2nd Edition 
  • Simon Sheather, A Modern Approach to Regression
  •  Data Mining 3rd Edition by Ian H. Witten, Eibe Frank, Mark A. Hall
Online Courses: 
Though a lot of courses are available online, I have stick to very few sites as below,
For Data Analysis, Stats, Maths:
For Big Data: 
Blogs and forums: 
Online forums is one place where I used to get a lot of information, in Linked Groups I could get answer to all my trivial questions. You post any query and you will get elaborate answer from research scholars to industry experts, I really love this place. I will list down few Linkedin groups I follow,
Blogs I follow:
Will add more when I come across the new tools. Guys hope this will serve you as a starting point for the Journey. All the best, Happy New Year. Please do add any new tools and technologies to the above list.
In my next post, shall post about how to tackle a data analysis problem


  1. Normally I don't read article on blogs, but I would like to say that this write-up very forced me to try and do so! Your writing style has been surprised me. Thanks, very nice post. Digital marketing

  2. Usually, I never comment on a blog but your article is so reassuring that I never cease to say anything about it myself. You are doing a great job Man and thanks for the post, interesting content to read. Keep it up.
    data visualization

  3. The blog post shares incredible and informative Data Analysis Tools. Thanks for sharing valuable tools.
    data analyst courses in limerick