The next generation of access tools for children must account for their specific needs and capabilities. To this end, the Second ASR Challenge for Non-native Children’s Speech is proposed as a Special Session at Interspeech 2021. Go here
The main goal of this challenge is to improve research on non-native children’s speech recognition technology. It follows the successful first challenge held at Interspeech 2020.
In this study, a non-native children’s English speech recognition system is trained using a feature discriminative training set. For this purpose, two types of children’s read speech and spontaneous speech data were used. Specifically, the first type is a 1.82 h train set of data from 15 children aged 7-12 years.
Speech technology built for adult voices doesn’t work for kids.
The second type is a planned corpus of children speaking in a natural setting. This corpus will include a variety of activities such as doors closing and bell ringing. Among the factors that affect the performance of the system is the age of the children. Younger children have a poorer experience.
Children’s speech differs from adult speech in many ways. One of the most notable differences is the voice. Adults use a more mature vocal tract, but children’s vocal tracts are less developed. Consequently, the speech recognition systems for children often exhibit higher error rates.
To overcome this limitation, studies have attempted to adapt acoustic features of children’s speech to match those of adult speech models. These methods have produced some promising results. However, further study is needed to clarify the effects.