Key Takeaways from Andrew Ng and Adam Coates AMA on Reddit

Karthe 24 Apr, 2015
6 min read

‘ At Baidu, our goal is to develop hard AI technologies that impact hundreds of millions of users across the world’. – Andrew Ng

In case you missed it, let me set the context of this erudite discussion which happened on Reddit on 14th April 2015. This was an AMA with Andrew Ng, Chief Scientist at Baidu Research/Coursera Co-Founder/Stanford Professor and Adam Coates, Director of Baidu Silicon Valley AI Labs.

The thread manifested endless appreciations for Andrew Ng for his fantastic Machine Learning course on Coursera. Needless to say, the questions were satisfactorily answered exhibiting a few eye openers too, also mentioned below.

Below are a few key takeaways from this AMA. We have quoted the answers from the AMA for each of the sections directly:

new word cloud

(Note: The answers have been given together by Andrew and Adam.)


1. Advice on career in Machine Learning for people trying to learn Machine Learning

Building a strong portfolio of projects done through independent research is valued a lot in industry. For example, at Baidu Research we hire machine learning researchers and machine learning engineers based only on their skills and abilities, rather than based on their degrees and past experience (such as demonstrated in a portfolio of projects) helps a lot in evaluating their skills.

I think mastering the basics of machine learning should be the best first step. After that, I’d encourage you to find projects to work on and to use this to keep learning as well as to build up your portfolio. If you don’t know where to start, Kaggle is a reasonable starting place; though eventually you can then identify and work on your own projects. In the meantime, offline engagement such as reaching out to professors, attend local meetups, try to find a community helps a lot.

This is often enough to find you a position to do machine learning work in a company, which then further accelerates your learning.



2. Whether a PhD is mandatory for a career in ML

Doing PhD is one great way to learn about machine learning. But the irony is, many top machine learning researchers do not have a PhD.

Given my (Andrew’s) background in education and in Coursera, I believe a lot in employee development. Thus at most of the teams I’ve led (at Baidu, and previously when I was leading Google’s Deep Learning team/Google Brain) I invested a lot in training people to become expert in machine learning. I think that some of these organizations can be extremely good at training people to become great at machine learning.

I think independent learning through Coursera is a great step. Many other software skills that you may already have are also highly relevant to ML research. I’d encourage you to keep taking MOOCs and using free online resources (like With sufficient self-study, that can be enough to get you a great position at a machine learning group in industry, which would then help further accelerate your learning.


3. Best follow up courses/self projects after Coursera ML course

1. Many people are applying ML to projects by themselves at home, or in their companies. This helps both with your learning, as well as helps build up a portfolio of ML projects in your resume (if that is your goal). If you’re not sure what projects to work on, Kaggle competitions can be a great way to start. Though if you have your own ideas I’d encourage you to pursue those as well. If you’re looking for ideas, check out also the machine learning projects my Stanford class did last year: I’m always blown away by the creativity and diversity of the students’ ideas. I hope this also helps inspire ideas in others!

2. If you’re interested in a career in data science, many people go on from the machine learning MOOC to take the Data Science specialization. Many students are successfully using this combination to start off data science careers.


4. Use of Machine Learning concepts:

A lot of deep learning progress is driven by computational scale, and by data. For example, I think the bleeding edge of deep learning is shifting to HPC (high performance computing aka supercomputers), which is what we’re working on at Baidu. I’ve found it easier to build new HPC technologies and access huge amounts of data in a corporate context. I hope that governments will increase funding of basic research, so as to make these resources easier for universities all around the world to get .

5. Important set of skills for Machine Learning

The skillset needed for different problems is different. But broadly, the two sources of “knowledge” a program can have about a problem are (i) what you hand-engineer, and (ii) what it learns by itself from data. In some fields (such as computer vision; and I predict increasingly so speech recognition and NLP in the future), the rapidly rising flood of data means that (ii) is now the dominant force, and thus the domain knowledge and the ability to hand-engineer little features is becoming less and less important.

5 years ago, it was really difficult to get involved in computer vision or speech recognition research, because there was a lot of domain knowledge you had to acquire. But thanks to the rise of deep learning and the rise of data, I think the learning curve is now easier/shallower, because what’s driving progress is machine learning+data, and it’s now less critical to know about and be able to hand-engineer as many corner cases for these domains. I’m probably over-simplifying a bit, but now the winning approach is increasingly to code up a learning algorithm, using only a modest amount of domain knowledge, and then to give it a ton of data, and let the algorithm figure things out from the data.’

6. Excitement of work 

One of the things both of us (Adam & Andrew) talk about frequently is the impact of research. At Baidu, our goal is to develop hard AI technologies that impact hundreds of millions of users. Over time, I think we’ve both learned to be more strategic, and to learn to see more steps out ahead–beyond just writing a paper–to plot a path to seeing our technology benefit huge numbers of people. These days, this is one of the things that really excite us about our work!

7. Single Layer Networks vs Deep Learning Networks

One of the reasons we looked at single layer networks was so that we could rapidly explore a lot of characteristics that we felt could influence how these models performed without a lot of the complexity that deep networks brought at the time (e.g., needing to train layer-by-layer). There is lots of evidence (empirical and theoretical) today, however, that deep networks can represent far more complex functions than shallow ones and, thus, to make use of the very large training datasets available, it is probably important to continue using large/deep networks for these problems.

Thankfully, while deep networks can be tricky to get working compared to some of the simplest models in 2011, today we have the benefit of much better tools and faster computers — this lets us iterate quickly and explore in a way that we couldn’t do in 2011. In some sense, building better systems for DL has enabled us to explore large, deep models at a pace similar to what we could do in 2011 only for very simple models. This is one of the reasons we invest a lot in systems research for deep learning here in the AI Lab: the faster we are able to run experiments, the more rapidly we can learn, and the easier it is to find models that are successful and understand all of the trade-offs.

Sometimes the “best” model ends up being a bit more complex than we want, but the good news is that the process of finding these models has been simplified a lot!


8. Deep Learning vs Recurrent Learning Networks

I think RNNs are an exciting class of models for temporal data! In fact, our recent breakthrough in speech recognition used bi-directional RNNs. See We also considered LSTMs. For our particular application, we found that the simplicity of RNNs (compared to LSTMs) allowed us to scale up to larger models, and thus we were able to get RNNs to perform better. But at Baidu we are also applying LSTMs to a few problems were there is are longer-range dependencies in the temporal data.


To check out this complete discussions, you can visit the thread here.


Also See: On the eve of our second anniversary, an exciting dataset competition is currently going on where you can participate and win exciting amazon vouchers. Also, you get entry to exclusive our whatsapp group community. Here you can start: Click here.

For latest happenings on this contest, check out our FB page.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Karthe 24 Apr, 2015

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Take a note

10 Nov 23 • 08:00pm

View all notes