how does speech recognition work?

Technology never ceases to amaze us these days. More and more services and products are being pushed to the market every year. One of the most fascinating products of technology is speech recognition. Ever wondered how it works?

The following article is directed towards the non tech-savvy people. In very simple words, how does speech recognition work?

Speech recognition is possible because of an advanced software that takes an audio file as an input, processes every single part of the recorded speech inside the audio file, uses its large database to predict what words are being spoken, and then outputs the speech in the form you want.

This is much more complicated than it seems. Every single step mentioned above is a complex process by itself. However, we’ll not dive into that in order to keep things simple.

What makes the speech recognition feature possible?

What made speech recognition possible was the huge scientific research in both software and hardware fields.

The hardware’s job is to listen to the sound in a place and the job of the software is to transform the sound into an audio file, send it to a speech recognition algorithm to do it’s work and then display or print the speech in the form of text.

But what does the speech algorithm do to understand the speech it is presented with?

One main Artificial Intelligence branch that is important when it comes to speech recognition is called the Natural Language Processing (NLP).

So as you can tell, natural language processing is an AI and a machine learning field. NLP is used to process written text and understand it.

NLP is extremely important when it comes to speech recognition because the speech recognition algorithms need to understand what the audio files are saying in order to give the best possible output.

Like many AI fields, NLP uses some advanced algorithms to learn from data sets that are fed to them. Then they use the knowledge they gained to guess the speeches that are presented to them.

There are many algorithms used for speech recognition including deep neural networks algorithms and many others. However we’ll not dive into these as this is just a general overview that’s meant for non software developers.

Why are there many algorithms for recognizing speeches?

You may be wondering why there are many different speech recognition algorithms used today. Why not just use the best one only?

The answer is that there is no ‘best speech recognition algorithm’. Usually, each of the current advanced speech recognition algorithms deliver better results under certain conditions.

For example, if you want to do speech recognition in real time, you’ll need to use a faster but less accurate algorithm than if you’re doing it in non real time.

So that’s why we have many different speech recognition algorithms. We need different algorithms for different scenarios.

What are the applications of speech recognition?

Speech recognition pieces of software are used everywhere. Here are some examples of where you can find them:

1- Youtube’s automated subtitles.

Many Youtube creators depend on YouTube’s algorithms to generate the subtitles for their videos. YouTube’s speech recognition algorithms process the audio in the video and then generate the subtitles by themselves.

2 – Virtual Assistants

There are many virtual assistants nowadays. Google Assistant, Siri, Cortana and Alexa are all virtual assistants that heavily depend on speech recognition to operate. They listen to what you say, analyze it and then do what you told them to do.

3 – Google Docs

Google docs contain a very cool voice to text feature that can help you write faster. Just tell google docs what to write and it will write it for you.

4 – Robots

Robots need to take orders from humans. Some robots are even able to have a conversation with you.

Ever heard of Sofia the robot? Sofia is a state of the art humanoid robot. It can have a very decent conversation with anyone and speech recognition is one of the reasons it can do that.

Here’s a clip of Sofia and Will Smith having a funny conversation together.

There are many other applications to speech recognition. Here’s a list for you.

To summarize

Speech recognition uses a series of algorithms to first decode an audio file, then process it, then deliver the output that you desire. The applications of speech recognition are very important and they are transforming the way we interact with devices.