VNR – a Metric to Determine Intelligibility of Voice Recordings
The “Dodd-Frank” Wall Street Reform and Consumer Protection Act is a massive piece of financial reform legislation passed in 2010 as a response to the financial crisis of 2008. A part of the act has enforced many banks in the U.S. – as well as Europe and Asia - to look at ways to verify audibility and intelligibility of voice recordings. Such recordings contain the calls of traders, wealth brokers, and contact center workers.
One of the main drawbacks that dramatically affects the ability to verify audibility and intelligibility of voice recordings is the presence of background noise such as music noise, babble noise, street noise, car noise, and white noise.
A few months ago, we proposed IR's own deep learning-based method to address the audibility of human's voice in the presence of background noise. VNR (Voice-to-Noise Ratio), is our recent proposal to address the intelligibility of voice recordings. Simply put, VNR provides a metric to determine how easily a human's voice can be understood over various types of background noise. It does this by measuring the ratio of the power of voice to the power of background noise. The lower the value of the VNR is, the less the probability that the human's voice can be intelligible.
To provide the VNR metric, I have trained a deep learning model that separates a human's voice from the above mentioned types of background noise. Figures below show some examples of extracted voice and extracted background noise from original recordings using the trained deep-learning model:
Extracted voice and extracted background noise are then used to measure the power of voice and power of background noise, respectively. Table below shows VNR's mean and standard deviation for various conditions:
Condition |
Case |
VNR (Mean) |
VNR (Std. dev.) |
Intelligible |
Human's Voice with Background Noise (Music) |
19.1147 |
4.2948 |
Intelligible |
Human's Voice with Background Noise (Babble) |
19.2605 |
4.1948 |
Intelligible |
Human's voice Only |
26.7468 |
2.7451 |
Unintelligible |
Human's Voice with Background Noise (Music) |
-4.5900 |
4.3836 |
Unintelligible |
Human's Voice with Background Noise (Babble) |
-9.2067 |
4.4564 |
Unintelligible |
Background Noise Only (Music) |
-12.1703 |
4.0922 |
Unintelligible |
Background Noise Only (Babble) |
-12.5714 |
4.0340 |
VNR along with our previous solution can dramatically reduce the risks, and therefore the costs, that are associated with non-compliance. We are very excited with the results achieved with the test data and optimistic about their real-world impact.