AI Separates Human's Voice from Background Noise to Measure VNR

VNR – a Metric to Determine Intelligibility of Voice Recordings

The “Dodd-Frank” Wall Street Reform and Consumer Protection Act is a massive piece of financial reform legislation passed in 2010 as a response to the financial crisis of 2008. A part of the act has enforced many banks in the U.S. – as well as Europe and Asia - to look at ways to verify audibility and intelligibility of voice recordings. Such recordings contain the calls of traders, wealth brokers, and contact center workers.

One of the main drawbacks that dramatically affects the ability to verify audibility and intelligibility of voice recordings is the presence of background noise such as music noise, babble noise, street noise, car noise, and white noise.

A few months ago, we proposed IR's own deep learning-based method to address the audibility of human's voice in the presence of background noise. VNR (Voice-to-Noise Ratio), is our recent proposal to address the intelligibility of voice recordings. Simply put, VNR provides a metric to determine how easily a human's voice can be understood over various types of background noise. It does this by measuring the ratio of the power of voice to the power of background noise. The lower the value of the VNR is, the less the probability that the human's voice can be intelligible.

To provide the VNR metric, I have trained a deep learning model that separates a human's voice from the above mentioned types of background noise. Figures below show some examples of extracted voice and extracted background noise from original recordings using the trained deep-learning model:

Extracted voice and extracted background noise are then used to measure the power of voice and power of background noise, respectively. Table below shows VNR's mean and standard deviation for various conditions:

Condition	Case	VNR (Mean)	VNR (Std. dev.)
Intelligible	Human's Voice with Background Noise (Music)	19.1147	4.2948
Intelligible	Human's Voice with Background Noise (Babble)	19.2605	4.1948
Intelligible	Human's voice Only	26.7468	2.7451
Unintelligible	Human's Voice with Background Noise (Music)	-4.5900	4.3836
Unintelligible	Human's Voice with Background Noise (Babble)	-9.2067	4.4564
Unintelligible	Background Noise Only (Music)	-12.1703	4.0922
Unintelligible	Background Noise Only (Babble)	-12.5714	4.0340

VNR along with our previous solution can dramatically reduce the risks, and therefore the costs, that are associated with non-compliance. We are very excited with the results achieved with the test data and optimistic about their real-world impact.

Communications Blog • 7 MIN READ

AI Separates Human's Voice from Background Noise to Measure VNR

VNR – a Metric to Determine Intelligibility of Voice Recordings

Subscribe to our blog

Ready to get started? You're just one click away.

Communications Blog • 7 MIN READ

AI Separates Human's Voice from Background Noise to Measure VNR

VNR – a Metric to Determine Intelligibility of Voice Recordings

Related Articles

Communications • 6 MIN READ

Communications • 6 MIN READ

Communications • 3 MIN READ

Subscribe to our blog

Ready to get started? You're just one click away.