Top tips for great audio

6 mins read

It’s fairly obvious that, for a software project to be successful, the project team needs to have a good understanding of their customer’s requirements – even before they start work.

But how does the team fulfil customer expectations when the criteria for measuring success are subjective, varying from individual to individual?

This is the challenge faced by teams working with signal processing techniques to develop software for audio products. So, what’s the best way to achieve a result that makes the customer happy and, at the same time, avoids some of the problems inherent with development for audio systems?

With audio, the customer doesn’t just need a system that works well – they want a system which outputs what they consider to be great audio. The trouble with that requirement is everyone hears audio differently – for example, hearing changes as we age, with older people becoming less able to distinguish higher pitched noises – and we all have individual preferences for what sounds ‘good’ to us. You could find you have spent a long time developing a product that, in the end, the customer simply doesn’t like.

So, as well as the usual challenges of shifting customer requirements, tight deadlines, integration and customer support, you need to overcome the problem presented by sound itself. By its very nature, assessment of sound is more difficult than assessment of visual algorithms.

Although it’s possible to put pictures or sections of code side by side and compare them against each other at the same time, it’s not possible with audio. You cannot do simultaneous audio comparisons – you cannot listen to two things at once. So, A/B testing of audio results can only be sequential, not concurrent. Therefore, it takes much longer to test audio – you might need to listen to two hours of test recordings just to assess a small tweak to an algorithm. If you don’t take this into account, your deadlines will slip. Make sure your project plans include longer testing times than you think you will need.

Audio testing metrics

Of course, you could avoid this subjectivity by agreeing to use one of the commonly accepted metrics for testing audio, such as MOS scores. You feed in the audio and the output is a prediction of how the desired audience would rate the results. This does help with assessing quality, but it won’t help you work out how to improve. Furthermore, many of the common tests and metrics have been developed for – and are skewed toward – traditional applications such as fixed-wire telephony. So, using these tests can help, but it’s not the total answer.

Making sure your customer shares their vision – before you start work – for what they want the audio to sound like is critical. Time spent on exploring what they mean by ‘great audio’ before development starts will prevent the team wasting time on developing something which is going to be rejected as soon as the customer hears it.

Another reason for ensuring you have properly understood your customer’s vision is because your audio doesn’t work in isolation – it will be part of a system. You must think carefully about integration. Early integration is crucial for success, since all the parts have to work together. However, the rest of the system is limited by how much processing power is being used by the audio, and your audio is itself limited by the consumption of the rest of the system. There is no point – and a lot of wasted resources – in developing something which works brilliantly on an empty system if the audio then stutters on a real system. But, as developers know, integration is expensive.

To make sure you don’t waste time integrating something that will not be acceptable, you need first of all to talk to your customers. And, before you start development, get some examples of recordings in your chosen use case, and preview them or work offline to predict what you will be able to achieve – and make sure it meets the customer’s vision.

Once you know what your customer wants, development can start.

Understanding signal processing

Understanding signal processing is helpful for a software engineer working in the audio field – but it is not essential. However, there are four areas where some understanding of audio really helps.

The first of these is to do with gain structure – knowing that volume controls add a gain to your system. This applies to the internals of your software and will affect any gadgets you plug into your prototype. The result could be that the sound output is not satisfactory, and you may assume this is due to a bug in the code, while in reality it’s a matter of the gain structure. Knowing this could save a great deal of time in unnecessary debugging.

Secondly, software developers can sometimes forget that audio filtering adds a group delay. If you forget this essential fact, you may tend to over-promise the performance of your algorithm – you will think that it will act more quickly than it can, and the system will fail to meet the promised performance specification.

Thirdly, a small but important point you must bear in mind is the difference between real and theoretical data. Algorithm design, in maths, uses theoretical data. There is a zero in maths, but audio does not have a zero. Your algorithm may work fine in theory but, when it comes to testing your system with real data, there may be a signal that appears to be silent. If that happens, zoom in on it – it might be a bit of a hiss, or it may well be full of zeros.

Finally, you cannot take an algorithm and just deploy it on any device. You need to consider the deployment constraints that your algorithm will have as early as possible in the development process. Some DSPs are very efficient and low power, but they may have limited memory. Others may be very good for AI-based processing, but they will introduce higher latency. If you can design an algorithm that trades space and time, that would be great. But, in reality, most algorithms can’t do that, so you may find yourself excluded from the lowest power embedded devices.

There are other errors which developers who are not experienced with audio commonly make, and technical issues which must be considered carefully.

For example, not getting the software streaming early enough in the development process. This is important because, if you’re not streaming early, you could be working with files which lead you to over-promise on results. This is because, if you’re writing an algorithm that adds an element to the data structure for each bit of audio it sees, the size of the data structure is proportional to the size of the file you’re working with. However, once the file is replaced by an audio stream, the data structure could grow indefinitely while the device is running. By streaming early, you de-risk the development and you can be more confident that your algorithm is ready for mass production.

As well as thinking about the time that’s needed to test audio, you must also consider your testing process right from the start of development. Testing via the audio output only is difficult because it’s a real number signal. Make sure you’re doing as much unit testing as possible – not relying on the audio output, which can vary between different processors and platforms.

The coding process

You also need to think about the coding process itself. An important decision is whether you will use fixed point or floating point. Fixed point used to be the ‘go to’ approach for representing audio samples for storage and computation. A fixed-point calculation will use the same parts of the ALU as are used for integer calculation – a simple mathematical trick is to approximate continuously varying quantities, subject to trade-offs in precision and the size of the quantities.

Floating point is more complex to implement in an ALU but, in modern CPUs such as the ones found in mobile phones, there is little or no penalty in using it. What small penalty may exist is offset by the reduction in engineering time requirements and the increase in the amount of time available to optimise the algorithms. Audio algorithms are often large and complex, and floating point enables them to be implemented with fewer engineering resources because it simplifies the mathematics. Choosing floating point means you don’t need to worry about integer overflows or underflows.

Also bear in mind that the CPUs typically used in mobile phones don’t just do floating point – they will vectorise it. So, if this is your use case, make sure you design your code to be able to vectorise.

Another tip is to write audio out to files generously while you’re investigating system behaviour. Depending on what medium you’re writing to, you may need a worker thread – an SD card, for example. This worker thread is like a software butler to which you give audio data; it patiently waits and then gives the data to your device. This means your core algorithm doesn’t have to wait, blocking runtime behaviour. If multiple files are being written to, check they all start at the same place. If you leave a few milliseconds off the beginning of one of the files, you will find an unexplained latency of the same number of milliseconds in the system.

Audio signal processing is definitely a specialist area and there are many issues which could trip up an inexperienced development team. But, with careful preparation and some awareness of the technical challenges, you can ensure a successful product development. 

Author details: Dave Betts, Chief science officer at audio software specialist AudioTelligence