Background

Aurora has been the name of an ETSI (European Telecommunications Standard Institute) working group as part of the technical committee “Speech processing, Transmission and Quality aspects (STQ)“.

The working group was mainly active from about 1997 till 2002. After this further evaluation and the definition of fixed-point DSR standards were conducted within 3GPP. The goal was the definition and the standardization of two schemes for extracting the acoustic features from a speech signal for automatic speech recognition. The target scenario is a distributed realization of speech recognition (DSR) where the acoustic features are extracted in any type of terminal in a fixed or mobile network and are transmitted to a recognition system at a remote position somewhere in the network. The standard documents as well as an exemplary software realization as floating point C code are available for both schemes from ETSI. The first one consists of an “usual” cepstral analysis scheme. The second one can be seen as an extension of the first one by adding two further processing blocks for extracting robust features in the presence of background noise and unknown frequency characteristics as they occur in real application scenarios.

During the process of defining the second standard, several data bases have been created or set up to enable the comparison of different approaches with respect to their performance on recognizing distorted speech data. The first data base is referenced under the abbreviation “Aurora-2”. It is based on the usage of the well known TIDigits data base, containing sequences of English digits. Distortions have been artificially added to the data. The second data base “Aurora-3” actually consists of 5 subsets. Each subset consists of sequences of digits that have been recorded in the noisy car environment as part of the “SpeechDatCar” project. An individual subset contains the recordings in a specific European language where Finnish, Italian, German, Spanish and Danish data were available at that time. In comparison to Aurora-2, where the noise was artificially added, Aurora-3 reflects the real speech input in the noisy car environment. Aurora-2 and Aurora-3 aim at the recognition of connected digits as example for a small vocabulary task. To enable also a comparison on a large vocabulary task the Aurora-4 data base has been created. This is based on the usage of the “Wall-Street-Journal (WSJ)” data that have also been applied for the evaluations at the DARPA contest in the early 90s. Different noise signal have been artificially added to the original data.

Besides the speech data, recognition experiments have been defined for comparative studies. These experiments are based on the usage of the HTK (Hidden Markov Model Toolkit) software package in case of Aurora-2 and Aurora-3. The recognition system of the Mississippi State University has been applied for the large vocabulary task of Aurora-4. In the meantime there is also a HTK based implementation available.

Another data base “Aurora-5” has been created in 2006. This is intended to investigate the influence of a hands-free speech input in rooms or inside a car as an additional distortion effect in combination with additive background noise. Furthermore the influence of transmitting the speech over a cellular network can be studied, including speech coding and the effects of an erroneous cellular channel. Aurora-5 is also based on the usage of the TIDigits and the application of HTK.