The testing hierarchy

4 mins read

Testing is a key aspect of a software engineer’s work – both during development and when their code is integrated into a product. But the nature of audio means it presents unique testing challenges, which need to be considered to plan a successful testing process.

Testing is an integral part of software development and software engineers understand the importance of testing their code during development, but the unique nature of audio means it presents special testing challenges for engineers.

So how do you assess audio software to give yourself the best chance of producing good audio in your end product?

While different versions of code can be compared simultaneously – when it comes to different audio outputs they have to be assessed sequentially, which makes comparisons not only harder but more time consuming. This will have an impact on your project planning. Further, subjective assessment of sound varies between individuals, which means the definition of ‘good audio’ will differ from one person to another.

Some subjective assessment by humans giving either a quantitative or qualitative response to your audio is essential. But should you use ‘expert listeners’ or the general public? Expert listeners have been trained in listening to audio and assess it using industry-standard criteria. They can detect problems, such as artefacts and distortions, which untrained listeners almost certainly could not hear. But the final users of your product will be untrained listeners who may not like the sound ‘approved’ by a set of expert listeners.

So, using only an expert panel could give you a product which no one actually wants. Conducting tests with a panel of the general public is important.

Listening panel composition

The composition of your listening panel is important. Will you include people who have some level of hearing impairment? If so, to what degree? If you’re developing a hearing assistive product, having people with hearing impairment on your panel will be essential.

The panel should be asked to assess both the quality of the audio and its intelligibility. Audio quality is about the overall user experience, while intelligibility refers to how much of the speech the user can understand correctly without straining to hear. Setting these tuning parameters will almost certainly involve making some trade-offs. For example, intelligibility adjustments might reduce noise and so make the audio sound more processed. The more aggressively you reduce noise, the more you will introduce side effects – such as ‘pumping’ – into the sound you actually want.

What environment will you test in? Will it be an online or in-person test? If it’s in-person, will you test in a controlled environment in a lab, or in the real world, letting end users hear the results in the scenarios for which the product was designed? Both have their place, and they are not mutually exclusive; the optimum is to do the controlled testing first, before taking the product out and about.

Objective testing

Objective tests solve some of these problems, but they have their own issues. They can be divided into performance metrics – signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR) – and intelligibility measures, such as short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and perceptual objective listening quality analysis (POLQA).

But beware – if you change your algorithm to improve the result on one of these metrics, it might change the score on another metric. Consider the relative salience of the different metrics and their applicability in the environment in which your product will be used.

All these tests are ‘intrusive’, which means they require a ‘ground truth’ recording as well as the processed result. So, the procedure is:

Create your ground truth: record each sound source separately on your array in a low-noise environment, keeping all other environmental factors as similar as possible.

Artificially mix the individual source images to create a mixture: microphones are generally very linear over their dynamic range, so this is a safe approximation as long as you avoid clipping and very small, quantised values.

Process the mixture using your algorithms.

Calculate the desired performance metric by comparing algorithm output with the ground truth recordings.

But bear in mind:

  • The ground truth needs to be time aligned with the processed version. Even a few ms discrepancy can damage the measure.
  • Noises like cafe babble are diffuse and should be simulated on multiple loudspeakers, ideally with about twice the number of speakers as microphones.
  • Real-world babble recorded in places such as busy cafes can be used to validate the performance you are getting with simulated diffuse noises.
  • Use real human beings for at least some of the source image recordings, as people have different characteristics from loudspeakers and are generally less like a point source.
  • You can vary the mixing ratios to extend the usefulness of your test dataset to varying acoustic scenes. Process some recordings of a full mix recorded in the field and perform listening tests to validate that in the real world.

A testing hierarchy

To enable you to make the appropriate decisions about testing your audio, it is helpful to think about audio assessment as a hierarchy of different tests in different environments.

As you develop your product, your testing progresses from internal testing to working with end users in the real world. Choose the tests which you deem most appropriate to your product, bearing in mind the considerations outlined above.

To begin with, test internally, using an available automatic quality assessment algorithm. Then take recordings on your array and process them using your software. Ensure different people in your team listen to the results and give feedback. Then repeat this test using your first prototypes.

External tests progress from sending recordings to trained listeners, to getting trained listeners to test in a controlled environment, to testing with real people in a dedicated testing environment. But it may be enough to move from sending recordings to trained listeners to testing with real people in a test environment.

Arguably the most important test of all is taking your product out into the world to be tested by real consumers. But you cannot develop an audio product by moving straight to real-world tests because, in the early stages of development, you need to have repeatable tests – and that is only possible where you can control the environment.

Remember your final objective: producing a sound that the end users of your product will think is good enough to ensure the product’s commercial success. Plan your testing methods and decide what is appropriate for your needs well before you start development. Include plenty of testing time at each stage of your project because it will all take longer than you think.

Author details: Dave Betts is Chief Science Officer at audio software specialist, AudioTelligence