“Well, we kind of have 10-times more computation …”
In 2013 I had a long interview with Peter Lee, corporate vice president of Microsoft Research, about advances in machine learning and neural networks and how language would be the focal point of artificial intelligence in the coming years.
At the time the notion of artificial intelligence and machine learning seemed like a “blue sky” researcher’s fantasy. Artificial intelligence was something coming down the road … but not soon.
I wish I had taken the talk more seriously.
Language is, and will continue to be, the most important tool for the advancement of artificial intelligence. In 2017, natural language understanding engines are what drive the advancement of bots and voice-activated personal assistants like Microsoft’s Cortana, Google Assistant, Amazon’s Alexa and Apple’s Siri. Language was the starting point and the locus of all new machine learning capabilities that have come out in recent years.
Language—both text and spoken—is what is giving rise to a whole new era of human-computer interaction. When people had trouble imagining what could possibly come after smartphone apps as the pinnacle of user experience, researchers were building the tools for a whole new generation of interface based on language.
“We believe that over the years if you build software, everything will want to learn language,” said Lili Cheng, general manager of Microsoft’s FUSE Labs, in a briefing with reporters in Seattle ahead of Microsoft Build 2017. “I think over the year we have seen so much happen over conversational AI and bots.”
The Commercial Breakthrough Of Neural Networks
One of the reasons that Lee and Microsoft Research focused on language when developing machine learning was because it fitted into several different kinds of buckets of artificial intelligence research. Language could function as a way for researchers to perform theoretical, open field experiments with no intention for practical deployment other than creating knowledge for the sake of knowledge. Language, as we have seen since, also presented the opportunity for distinct commercial applications.
One is that there has been, there is right now for us, a resurgence of hope and optimism in being able to solve some of the longest standing problems in core artificial intelligence. To get machines that see and hear and understand reason at levels that understand or match human capabilities.
I think we are seeing that first in dealing with language. I think language is coming first because it is a little bit of a simpler problem but one that has commercial implications. So, that is moving really fast.
The application of those ideas to computer vision, to finding patterns and signals on things you wear all day. From looking at all the instrumentation and logging out of factories. From looking at all the electronic health records that hospitals are working with. The applications for deep learning from all of that are pretty impressive.
The focus on language has given us the first commercial taste of artificial intelligence in the real world. In 2011, Microsoft added translation to Skype. Virtual assistants like Cortana, Siri, Google Assistant and Alexa are creating new avenues of human-computer interaction.
But, more importantly, the focus on language (and images) have given rise to the deployment of neural networks, the engines behind machine and deep learning and the harbinger of artificial intelligence.
Where Neural Networks Came From And Where They Are Going
The concept of neural networks is not new.
The idea has been around for more than 70 years. Some of the first attempts to build computers were modeled after human brains. But logic engines proved to be much more efficient, creating the binary machine code that we use in all of our software today. The idea of neural networks resurfaced in the 1980s when researchers made breakthroughs in decision-making algorithms that veered away from the string logic engines. The concept of artificial intelligence and research was hot until the early 1990s, when DARPA pulled funding for AI research and researchers realized that the pure volume of computing power to make it happen did not yet exist.
This period has been called the “AI Winter.”
“Speech recognition was one of our first areas of research. We have 25-plus years of experience. In the early 90s, it actually didn’t work,” said Rico Malvar, distinguished engineer and chief scientist for Microsoft Research, in a briefing at Microsoft’s campus in Redmond. “We got to the early 2000s and we got some interesting results. We started getting error rates below 30%. From 2000 to almost 2010, we had very little progress.”
2009 is seen in the artificial intelligence community as the year when real progress in deep learning networks really started to take off. Li Deng of Microsoft applied deep learning networks to language and was astonished by the results. Fei Fei Li of Stanford (and now a chief scientist at Google) opened up ImageNet, a deep learning network for image recognition.
Around 2012 into 2013, deep learning networks proved to be the future of artificial intelligence. Microsoft had a seminal breakthrough in natural language understanding. Google began buying every artificial intelligence and robotics company it could find. In that time frame Facebook began applying its massive data set to artificial intelligence problems. In 2014, Chinese search engine Baidu hired Andrew Ng, formerly head of the Google Brain project.
Power Plus Software: The Maturation Of Neural Networks
The enabling factors for deep learning networks are correlated to the rise of computing as a whole. The advent and maturity of the Internet required computing to expand at a massive scale. Outside of consumer electronics, this meant scaling the data center to handle the computation and storage of massive amounts of information. Most of this information is stored as texts and images, which just happens to be the ingredients needed to train neural networks. Technology companies began constructing massive data centers (to build what we now call the cloud), creating more potential computing capability than current demand would dictate.
An obvious marriage was formed.
“The deep neural network guys come up and they invent the stuff. Then the speech guys come along and say, ‘what if I use that?’” said Malvar. “’That is going to take 10-times more computation’ … well, we kind of have 10-times more computation.”
The accuracy for speech, text and image recognition became much, much better. Both Google and Microsoft claim accuracy between 4.9% and 5.9%, which is on par with human levels of recognition.
The combination of neural networks and massively powerful cloud computing can lead to some stunning results. For instance, Microsoft has been adding power to its Azure cloud in the form of field programmable gate arrays (FPGAs) that are essentially powerful machine learning chips baked into servers. Right now, the combination of Microsoft’s cloud plus neural networks is so powerful that it can translate the entirety of Wikipedia from English to Spanish … in a tenth of a second.
This is the immediate future for deep learning networks and machine learning. Frameworks like Google’s TensorFlow, Microsoft’s CNTK, Facebook’s Caffe2, Torch and Theano will grow more sophisticated as recurrent and convolutional neural networks mature. The cloud will grow to handle more and more data as network centers are built, Moore’s Law continues its final stages of development and acceleration hardware like GPUs, FPGAs and Google’s Tensor Processing Units (TPU) evolve.
Practical Application: Coming To A Computer Near You
Imagine being at the dentist. The first thing the dentist does when you come in is take X-rays of your teeth to help look for cavities that need to be filled. If a dentist works a typical American amount of hours per year, she will work about 224 days per year. If she sees six patients per day and performs X-rays on all of them, she will see about 1,342 X-rays per year. Over 20 years, a dentist will see 26,850 X-rays of patients’ teeth.
Twenty years and 26,850 X-rays would represent the pinnacle of profession, the culmination of one human’s knowledge in one subject. There are 1.8 million dentists in the world. If all of them see a typical amount of X-rays, that’s 48.33 billion images in 20 years.
And it would take today’s neural networks hours to train and seconds to compute all 48.33 billion images.
Healthcare is one of the obvious industries for machine learning advancement. A doctor’s work is often predicated on images such as MRIs, X-Rays, CT Scans and so forth. One doctor can only do so much. But neural networks can be trained to recognize images, form patterns and make analytical conclusions based on the entirety of human accumulated knowledge. Neural networks can take out the grunt and guess work and make human jobs easier and more efficient.
From the beginning of time, this is exactly what technology is for: make human jobs more efficient. Onerous to turn a field with a hoe? How about a plough. Hard to drag all those rocks from the quarry to the city? Have some wheels. Horses too fickle and not fast enough? Try this steam engine. Computation of logic difficult to do by hand? I have this computer.
Have so much data that it is impossible to manually organize and understand? Let’s start up this neural network. The answer to accumulating, understanding, analyzing and predicting all of human data, knowledge and behavior will be the maturation of neural networks.
It can be difficult to fully wrap one’s head around just how big machine learning is going to be. Think about the fact that almost every human behavior can now be tracked, counted and digitized through smartphones, the Web, cameras and sensors. Where you go, how you get there, what you eat, how you spend money, how our industry works, all the environmental data around you … everything. The machines of the technology industry will soon put its neural networks to the task.