My 10 month old daughter took her first baby steps and watching her take those steps was one of the most beautiful moment of my life. A baby, brimming with excitement to reach out to her father, trying to balance, while exploring her newly acquired skill and trying to speak simultaneously in her own way – was a moment to cherish! I hope I could have recorded that instance!
What followed the first little walk was equally exciting – Jenika (my daughter) was enjoying practicing her new skill in spite of all the limitations, she faced. She kept faltering every 3-4 steps, but it didn’t seem to affect her. There were a few things, which stood out in the manner, she was practicing her new skill:
After spending the weekend with Jenika, both of us are still not over with the fun! She is still enjoying her day in the same innocent manner and I can’t stop thinking about it.
This is when this thought came to my mind – why not create a series of articles like baby steps – articles which enable users to take small steps towards a new skill, keep the excitement up and provide a refreshing start. I knew I was doing it, the second this thought crossed my mind!
The next question was, which area should these articles be written on? There were 2 areas I considered – Python for data analysis and Big data. Why them? For a simple reason, we have not spent a lot of time learning these skills till now. I will start with Python and we will take Big data at a later date.
In this series of articles, we will take bite sized information about how to use Python for data analysis, chew it till we are comfortable and practice it at our own end.
In today’s article, we will cover:
Python has gathered a lot of interest recently as a choice of language for data analysis. I had compared it against SAS & R some time back. Here are some reasons which go in favour of learning Python:
Needless to say, it still has a few drawbacks:
There are 2 approaches to install Python:
Second method provides a hassle free installation and hence I’ll recommend that to beginners. The imitation of this approach is you have to wait for the entire package to be upgraded, even if you are interested in the latest version of a single library. It should not matter until and unless, until and unless, you are doing cutting edge statistical research.
Once you have installed Python, there are various options for choosing an environment. Here are the 3 most common options:
While the right environment depends on your need, I personally prefer iPython Notebooks a lot. It provides a lot of good features for documenting while writing the code itself and you can choose to run the code in blocks (rather than the line by line execution)
We will use iPython environment for our future articles.
You can use Python as a simple calculator to start with:
A few things to note:
A few additional things you can try are:
If you feel, it is overwhelming, don’t worry, we will do these step by step in coming days.
Following are the steps required to create the first Logistic Regression model:
Since the purpose of this model is to just illustrate how to build a model in Python, I will take a clean dataset (remember dataset iris, which Tavish used in his previous article?) and just go ahead and build a model on the entire dataset. We will look at ways to split the data into test and train at a later point.
P.S. If you don’t understand all the steps today, don’t worry. I would suggest that you download and install iPython and just run this program as is to get used to the environment.
Here are the libraries and the dataset you will need:
Once you have read the dataset, you can print the dataset to explore it and build a Logistic Regression model on it:
And finally, you can compare the expected and predicted values:
That’s it. Your first model in Python is ready. You can now use model.predict(‘input data’) to make classifications basis the data provided. In the next article, we will show a way to share these notebooks with other people, accept contributions from other people and ways to version control it through github.
What do you think about using Python for data analysis? Did you find this tutorial and idea about this series useful? Do let me know, through comments below.
Reading through the post reminded me of a similar post for excel learning when i started a few years ago... http://chandoo.org/wp/2010/08/11/excel-for-beginners/ the interlinking of baby steps & your work is a mark of passion & commitment... great kick to the month of july...
Reading through the post reminded me of a similar post for excel learning when i started a few years ago, http://chandoo.org/wp/2010/08/11/excel-for-beginners/ the interlinking of baby steps & your work is a mark of passion & commitment great kick start to the month of july...
Hi Kunal, This is awesome.I have browsed through many websites for Python. But didn't get this much clarity and simplicity of the Program. Thank you for posting. Regards, S.S.Pradeep