Intelligent Audio

2 mins read

As more and more consumers look to film their experiences and share short-form videos through their mobiles, it’s fair to say that while camera technology has continued to advance, the audio quality of many videos has tended to remain poor.

"Audio technology on consumer devices just hasn’t kept pace with the innovation we’ve seen in video and image capture," suggests Paul Melin, VP of Digital Media at Nokia, "and we think that it’s about time that changed."

Melin believes that customers are keen to ‘elevate’ the quality of the audio experience and to that end Nokia has developed ‘intelligent audio’ that, "can dynamically target and track the desired source of sound. It provides a tremendous opportunity to enhance the user experience and to enable phone manufacturers to differentiate their products in a crowded market."

Originally developed for use with Nokia’s virtual reality technology, OZO Audio is focused on the mobile space.

"When the VR market’s growth wasn’t as quick as originally hoped we started to focus on new use cases,"says Melin.

"OZO Audio is intended to provide an immersive audio experience where you can hear the sound from the right direction. We want to improve the quality of user generated content by focusing on the audio direction.

"Our software has been designed to be flexible so that it can tune our algorithms to the microphone placements that the device manufacturer has in their device," Melin explains, which means you can, "zoom and focus on a particular person in a scene where people are talking or playing an instrument.

"The technology is able to capture and deliver a natural sound experience within one degree of accuracy, similar to how the human ear works. It can reduce distracting background noise and captures sound, it can adjust the audio to a specific part of the screen and select what matters. OZO also allows users to maintain audio focus on moving people or objects and automatically follow a sound source with audio focus parameters controlled by object recognition.

"AI has a big role to play in mobile audio and with OZO Audio it is possible to teach the software to understand the scene being shot and make more intelligent decisions as to how to control and direct the audio."

OZO Audio is the company’s first licensed technology.

"Spatial audio enables consumers to capture true to life sound that accurately reflects the original event by using multiple microphones to record the depth, direction and detail of sound," Melin contends. "It is an industry-leadfing solution for capturing high quality audio on consumer devices and can work with as little as two microphones."

Melin believes that the opportunity to bring spatial audio to a broad range of smart phones is a great opportunity and, crucially, the company’s intelligent audio does not require hardware or design adjustments.

"Because OZO is a software-based technology and not reliant on specific microphone configurations or placements, there are very few limits when it comes to what cell phone or camera hardware can use it, which means that virtually any manufacturer can license the software from the company for improved audio," he explains

Melin believes that the use of intelligent audio, imaging and video technologies will completely transform the way in which people capture and share their experiences.

"This includes 360 video and other immersive formats. Using AI and machine learning to automatically create richer experiences in familiar formats will now be possible without having to put hours into editing images or videos," he suggests.

"Consumers want the sound in their videos to be just right, and they want to be able to focus not only on the relevant audio sources but suppress unwanted noise."

OZO Audio recordings support the most common audio formats in use, like stereo AAC, so it will be possible for users to share content or post videos on social media and, crucially, no special playback equipment is required.

"OZO Audio is a much more immersive and engaging experience and I believe it takes user generated content to a new level where the audio finally stands side-by-side with the pictures," Melin concludes.