Viettel AI won in both categories: voice recognition and voice emotion recognition, Vietnamese - Lao machine translation.
Although this is the 4th time Viettel AI has participated in the contest and has won 3 times before, Viettel engineers still encounter many difficulties due to changes in the contest category structure.
Specifically, compared to last year, this year's voice recognition and emotion recognition categories are combined into one category. Competing teams must solve two problems at the same time to ensure recognition of both the text and the emotion of the sentence, the workload and difficulty are doubled.
Take advantage of every piece of data whether low or high quality
Not only changing the category structure, this year's exam also focuses on building models from scratch with limited data, including raw, unlabeled and low-quality data. In total, the test provides 4 groups of data with different quality and form.
Each team only has less than 2 months to write and submit the assignment, but in reality, the actual time spent researching solutions is even less due to lack of resources. "This year, Viettel AI spent a lot of resources on computing infrastructure to research new technologies as well as develop products, while voice recognition is a technology that requires huge hardware resources," he said. Dang Dinh Son, AI engineer, Virtual Assistant Platform Division, Viettel AI shared.
AI engineering team, Virtual Assistant Platform Division, representing Viettel AI participated in the speech recognition and voice emotion recognition category - VLSP 2023.
Faced with the condition of low data volume and quality, the research team immediately determined the viewpoint "must take advantage of all data no matter low or high quality".
Results from pioneering technology mastery
In the context of both lack of data and lack of resources, the research team decided to build a simple processing process, not massive but importantly fine-tuned to the smallest detail.
Combined with effective data processing methods to train the model, the research team built a training cycle that helps process all available data.
As a result, Viettel AI not only won first prize in the speech recognition and speech emotion recognition categories but also achieved an impressive score of 89.18% (the next teams got 83.4% and 83.4% respectively). 78.45%).
Explaining why he achieved outstanding results in terms of accuracy, Mr. Son said that the key factor lies in the speech processing model specifically for Vietnamese that Viettel AI has long developed. "Instead of using models and guidance from available research results, Viettel AI chose to build and develop its own model for Vietnamese speech processing. This model is continuously updated. , optimized and becoming more and more effective," Mr. Son said.
Bui Tien Dat, Virtual Assistant Platform engineer, Viettel AI, represented the competition team to present research results at the seminar.
Not only stopping at the competition framework, this solution of Viettel AI will be the premise to upgrade Viettel's virtual switchboard and virtual assistant products, helping to identify customers' emotions more accurately in conversations, From there, give feedback or choose appropriate sentence nuances.
Thus, conversations between humans and AI will become more natural, improving the user experience. Many new applications in customer care have also been opened, such as building a system to automatically identify customer complaint calls and complaints to the switchboard for timely processing or to exploit information.
Other news