How Language Led To The Artificial Intelligence Revolution

In 2013 I had a long interview with Peter Lee, corporate vice president of Microsoft Research, about advances in machine learning and neural networks and how language would be the focal point of artificial intelligence in the coming years.

At the time the notion of artificial intelligence and machine learning seemed like a “blue sky” researcher’s fantasy. Artificial intelligence was something coming down the road … but not soon.

I wish I had taken the talk more seriously.

Language is, and will continue to be, the most important tool for the advancement of artificial intelligence. In 2017, natural language understanding engines are what drive the advancement of bots and voice-activated personal assistants like Microsoft’s Cortana, Google Assistant, Amazon’s Alexa and Apple’s Siri. Language was the starting point and the locus of all new machine learning capabilities that have come out in recent years.

Language—both text and spoken—is what is giving rise to a whole new era of human-computer interaction. When people had trouble imagining what could possibly come after smartphone apps as the pinnacle of user experience, researchers were building the tools for a whole new generation of interface based on language.

“We believe that over the years if you build software, everything will want to learn language,” said Lili Cheng, general manager of Microsoft’s FUSE Labs, in a briefing with reporters in Seattle ahead of Microsoft Build 2017. “I think over the year we have seen so much happen over conversational AI and bots.”

The Commercial Breakthrough Of Neural Networks

One of the reasons that Lee and Microsoft Research focused on language when developing machine learning was because it fitted into several different kinds of buckets of artificial intelligence research. Language could function as a way for researchers to perform theoretical, open field experiments with no intention for practical deployment other than creating knowledge for the sake of knowledge. Language, as we have seen since, also presented the opportunity for distinct commercial applications.

Lee said at the time:

One is that there has been, there is right now for us, a resurgence of hope and optimism in being able to solve some of the longest standing problems in core artificial intelligence. To get machines that see and hear and understand reason at levels that understand or match human capabilities.

I think we are seeing that first in dealing with language. I think language is coming first because it is a little bit of a simpler problem but one that has commercial implications. So, that is moving really fast.

The application of those ideas to computer vision, to finding patterns and signals on things you wear all day. From looking at all the instrumentation and logging out of factories. From looking at all the electronic health records that hospitals are working with. The applications for deep learning from all of that are pretty impressive.

The focus on language has given us the first commercial taste of artificial intelligence in the real world. In 2011, Microsoft added translation to Skype. Virtual assistants like Cortana, Siri, Google Assistant and Alexa are creating new avenues of human-computer interaction.

But, more importantly, the focus on language (and images) have given rise to the deployment of neural networks, the engines behind machine and deep learning and the harbinger of artificial intelligence.

Where Neural Networks Came From And Where They Are Going

The concept of neural networks is not new.

The idea has been around for more than 70 years. Some of the first attempts to build computers were modeled after human brains. But logic engines proved to be much more efficient, creating the binary machine code that we use in all of our software today. The idea of neural networks resurfaced in the 1980s when researchers made breakthroughs in decision-making algorithms that veered away from the string logic engines. The concept of artificial intelligence and research was hot until the early 1990s, when DARPA pulled funding for AI research and researchers realized that the pure volume of computing power to make it happen did not yet exist.

This period has been called the “AI Winter.”

“Speech recognition was one of our first areas of research. We have 25-plus years of experience. In the early 90s, it actually didn’t work,” said Rico Malvar, distinguished engineer and chief scientist for Microsoft Research, in a briefing at Microsoft’s campus in Redmond. “We got to the early 2000s and we got some interesting results. We started getting error rates below 30%. From 2000 to almost 2010, we had very little progress.”

The enabling factors for deep learning networks are correlated to the rise of computing as a whole. The advent and maturity of the Internet required computing to expand at a massive scale. Outside of consumer electronics, this meant scaling the data center to handle the computation and storage of massive amounts of information. Most of this information is stored as texts and images, which just happens to be the ingredients needed to train neural networks. Technology companies began constructing massive data centers (to build what we now call the cloud), creating more potential computing capability than current demand would dictate.

An obvious marriage was formed.

“The deep neural network guys come up and they invent the stuff. Then the speech guys come along and say, ‘what if I use that?’” said Malvar. “’That is going to take 10-times more computation’ … well, we kind of have 10-times more computation.”

The accuracy for speech, text and image recognition became much, much better. Both Google and Microsoft claim accuracy between 4.9% and 5.9%, which is on par with human levels of recognition.

Imagine being at the dentist. The first thing the dentist does when you come in is take X-rays of your teeth to help look for cavities that need to be filled. If a dentist works a typical American amount of hours per year, she will work about 224 days per year. If she sees six patients per day and performs X-rays on all of them, she will see about 1,342 X-rays per year. Over 20 years, a dentist will see 26,850 X-rays of patients’ teeth.

Twenty years and 26,850 X-rays would represent the pinnacle of profession, the culmination of one human’s knowledge in one subject. There are 1.8 million dentists in the world. If all of them see a typical amount of X-rays, that’s 48.33 billion images in 20 years.

And it would take today’s neural networks hours to train and seconds to compute all 48.33 billion images.

Healthcare is one of the obvious industries for machine learning advancement. A doctor’s work is often predicated on images such as MRIs, X-Rays, CT Scans and so forth. One doctor can only do so much. But neural networks can be trained to recognize images, form patterns and make analytical conclusions based on the entirety of human accumulated knowledge. Neural networks can take out the grunt and guess work and make human jobs easier and more efficient.

From the beginning of time, this is exactly what technology is for: make human jobs more efficient. Onerous to turn a field with a hoe? How about a plough. Hard to drag all those rocks from the quarry to the city? Have some wheels. Horses too fickle and not fast enough? Try this steam engine. Computation of logic difficult to do by hand? I have this computer.

How Language Led To The Artificial Intelligence Revolution

The Commercial Breakthrough Of Neural Networks

Where Neural Networks Came From And Where They Are Going

Crowdtesting vs. System Integrators

EU AI Act: A Practical Guide for QA Leaders

Web Accessibility Testing: Audits, Insights and Ecosystems

Embracing AI and Modern Tools: A Blueprint for the Future of Development

Web Accessibility Testing: The Tactical Playbook and SDLC Integration

Web Accessibility Testing: Foundations, Stakeholders and Inclusivity

General

Company

Resources

Legal