The new SI from Google knows who and what he says – even in the noisy crowd

Google has created an artificial intelligence that can recognize who the voice belongs to, by looking at the faces of speaking people.

Google is pretty good at registering what users say (Android, for example), but is the company able to tell who exactly speaks the item? As it turns out, right now. Google has developed a deep learning system that can extract individual voices. How? Literally looking at the faces of people when they tell you.

How does the system differentiate between different people’s speech? First, the scientists trained him to recognize individuals who spoke individually. Then they created a virtual noise with the help of additional people, so that an artificial crowd was created. In this way, the system of dividing the soundtrack into different parts and recognizing – which belongs to whom.

The results are amazing. As you can see in the video below, artificial intelligence can distinguish the voices of two different comedians, even if their individual occurrences overlap. The system does this by looking only at their faces. Interestingly, this method works even when the faces of comedians are only partially visible, for example when they are slightly covered by microphones.

Google described her work in detail in an article titled “Looking to Listen at the Cocktail Party.” The name refers to the so-called cocktail reception effect, which describes the human ability to focus on one sound source despite the surrounding noise.

Researchers are still trying to determine how this technology could be used in Google products. There is no doubt, however, that the results of its application could be both positive and negative.

