VIT Speech to Intent Engine enables smarter interactions with edge devices

2 mins read

NXP Semiconductors has announced the VIT Speech to Intent engine, a natural language understanding engine that leverages edge computing to enable local voice control.

Designed to rival cloud-based systems’ performance, VIT Speech to Intent does not require a cloud connection, supporting improved user privacy according to NXP.

VIT Speech to Intent is part of NXP’s Voice Intelligent Technology (VIT) software suite and allows people to speak naturally to smart machines across IoT, industrial and automotive applications, rather than have to memorise precise commands or phrases to operate the devices.

Voice has become one of the primary user interfaces for smart devices. However, many of these smart devices require precise phrasing to execute the desired action or cloud connections to translate the user’s speech into device actions.

The natural language understanding delivered by VIT Speech to Intent has been developed to allow devices to understand users’ intent, without requiring exact phrasing or cloud connectivity. This opens new possibilities for innovation, particularly in the smart home and places where users’ hands may not be free such as hospitals or factory floors.

“As we move towards smart devices that can better anticipate and automate based on our needs, particularly in the smart home, voice has emerged as one of the preferred ways to communicate our preferences to devices,” said Rafael Sotomayor, Executive Vice President and General Manager, Secure Connected Edge, NXP. “VIT Speech to Intent allows people to interact with smart devices seamlessly, without needing to rely on specific keywords, delivering convenience and ease of use, reducing complexity and alleviating user frustration. This further enables the transition from a smart home to an autonomous one.” 

With a small memory footprint and limited computational requirements, the VIT Speech to Intent engine is suitable for use on NXP devices including i.MX RT Crossover MCUs and RW61x MCUs and i.MX 8M Mini, i.MX 8M Plus, and i.MX 9x applications processors.

The VIT Speech to Intent engine runs locally, eliminating the need for a cloud connection to support improved user privacy, lower latency, reduced power consumption and reduced costs. This can be used to support natural speech interfaces with a wide variety of applications, including smart watches, smart HVAC, and more.

At present, VIT Speech to Intent supports English language interactions, with support for Mandarin coming later this year. Additional support is planned for Spanish, German, Korean, French and Japanese in 2024.

VIT Speech to Intent is part of NXP’s Voice Intelligent Technology (VIT) software suite, a fully comprehensive, local voice control software package.

Based on advanced deep learning, VIT is comprised of an always-on Wake Word engine, a Voice Command engine, and a Speech to Intent engine. Developers can use NXP’s free, ready-to-use Wake Word and Voice Command engines available through the MCUXpresso SDK and which supported by an online model creation tool.

In addition, developers can upgrade to the newly released Speech to Intent Engine.