Twitter started the trend of ‘People to Follow’. This later got replicated by other platforms such as Facebook, Linkedin, Quora and GitHub. This cool feature lets you connect with the rockstars of various domains and get an access to what is going on their end without bothering them much. For the influencers, this has become an effective way to communicate with their followers.
The lives of people on GitHub doesn’t appear to as tempting as you would observe on other platforms, but if you love coding, programming and data science, you’ll surely enjoy the company of 9 million users on this platform!
Following influencers is usually a good practice. It has helped me in multiple ways:
If you haven’t tried this yet, it’s your turn now. I have compiled a list of some awesome Data Scientists on GitHub. In addition, you’ll also find the list of best data science tutorials available on GitHub. Specially for beginners, if don’t know about GitHub, here’s the quick introduction in simple words.
You can best understand GitHub as a social network for coders across the globe. Coders across the world can share their codes and work in collaborative manner using GitHub. GitHub started in 2008 and is a web based platform which provides online project hosting using Git.
Git is a version control system which helps you save various versions of your project in the original form and allows you to retrieve them later without any problem. Git was created by Linus Torvalds (he also created Linux) and has been a boon to programmers across the world. It is a free, open source platform, where programmers from all over the world can save, display their codes. GitHub has not only made procuring codes an easy task, but have also rendered immense support to the programmers, coders worldwide.
To be honest, it is difficult to imagine the programming world without Git today!
Now, if you are new to GitHub, you would be asking, where do tutorials come in on a platform meant for version control and sharing of codes. Well, because of its niche community, a lot of people have started creating resource repositories on GitHub. Essentially, since the programmers spend a lot of time on GitHub, why not create list of resources they use regularly.
Here’s a compiled list of tutorials on various topics in data science. These resources can be very handy. I suggest you to bookmark these (or watch these on GitHub).
Awesome Data Science: This is an awesome repository if you are to begin with Data Science. Here you’ll find every step that you need to take till the end of your journey.
Data Science Resources: This is another repository of data science tutorials to help you conquer this skill set. You can free to choose any of these, both are equally good.
Text Books in Data Science: If you like to read and refer to books, here is a compiled list of best books on machine learning, data mining, statistics, data visualization etc.
Data Science Algorithms: Here’s a comprehensive overview & explanation of algorithms such as Linear Regression, Logistic Regression, K-Mean Clustering, Random Forest. You’ll also find their worksheets for practice.
Statistics and ML: Here’s a list of tutorials to become efficient in your day to day programming. It covers python pandas, machine learning algorithms, statistics and data visualization
Scikit Learn: Scikit learn is a python library for machine learning. This repository has everything to offer to help you learn about machine learning in Python. ( Hint: Dig Deeper )
Awesome Machine Learning: Here is an ultimate list of tutorials, resources, guides for machine learning, data analysis, natural language processing, data visualization in all the programming languages like Python, R, Java, Go, C++, Swift. Choose accordingly.
Complete Machine Learning: Here’s a collection of tutorials and examples for solving problems using machine learning. It consist of beginning to end steps of ML covering stages such as model evaluation, implementation of ML algorithms, data visualization etc.
Parallel Machine Learning: This tutorial is on using scikit learn and ipython for parallel machine learning. Here you’ll find a 2 hours long video from Pycon 2013 with lecture notes and other useful resources.
Machine Learning Courses: Here’s a list of Best Machine Learning Courses in the world.
Caffe: Caffe is a deep learning framework made with expression, speed, and modularity in mind. This repository consist of installation instructions and other recommended tutorials to help you learn this framework properly.
Awesome Deep Learning: Here’s a curated list of tutorials on Deep Learning which includes deep learning courses, free books, videos and lectures, papers and other useful resources to follow.
Deep Learning in Python: Here’s a complete tutorial on implementation of Deep Learning in Python
Deep Learning in Julia: Mocha is a Deep Learning framework for Julia. This tutorial follows a step by step methodology to be able to introduce this framework in the best possible manner.
Recurrent Neural Networks: Here’s a awesome list of dedicated resources for RNN. If you have longed to curate the resources for RNN, you’ve like to stop here and take a glance. This guide consists of codes, lectures, books and resources on multiple applications of RNN.
Here’s is a compiled list of most influential data scientists on Github to follow. These data scientists are experts in their respective field which ranges from python, machine learning, neural nets, data visualization, deep learning, data science etc.
1. Sebastian Raschka (Machine Learning, Data Visualization)
2. Randy Olson (Python – Data Analysis, Matplotlib, Bokeh)
3. Hilary Mason (Chief Data Scientists at Bitly)
4. Mike Bostock (D3, Data Visualisation)
5. Prakhar Srivastav (Python, Algorithms)
6. Andreas Mueller (Machine Learning, Python)
7. Wes McKinney (Author of Python for Data Analysis)
8. Jake Vanderplas (Machine Learning, Data Visualization)
9. Mathieu Blondel (Machine Learning, Neural Networks)
10. Gael Varoquaux (Machine Learning, Statistics, Python)
11. Oliver Grisel (Machine Learning, Deep Learning)
12. Andrej (Deep Learning, Neural Network, SVM)
13. Micheal Nielsen (Neural Networks, Deep Learning)
14. Heather Arthur (Neural Network, Javascript)
15. Allen Downey (Python, Algorithms)
16. Davies Liu (Apache Spark, Python)
17. Julia Evans (Machine Learning, Python)
18. Jeff L (R Programming, Data Analysis)
19. John Myles White (Julia, Machine Learning)
20. Thomas Wiecki (Python, Bayesian Analysis)
21. Brian Caffo (John Hopkins University)
22. Roger D Peng (John Hopkins University)
23. Stefan Karpinski (Julia)
24. Pete Skomoroch (Machine Learning, Big Data, Python)
25. Mike Dewar (Python, D3, Javascript)
26. Hadley Wickham (Statistics, Data Analysis, Data Visualisation)
27. Romain Francois (R Programming)
28. Justin Palmer (D3, Data Visualisation)
29. Jason Davies (D3, Data Visualization)
30. Cameron Davidson Pilon (Python, Algorithms)
GitHub is not just about coding and sharing codes. Its utility extends to connecting with experts and learn from them. The intent behind writing this article is to give you an overview of GitHub and its uses.
In this article, I have displayed the list of top 30 data scientists to follow on GitHub. I have also list down some of the best tutorials I felt are awesome. I hope these repositories turns out to be useful for you!
If you think, I’ve missed out on any useful tutorial or data scientist, feel free to add them in the comments section below.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,