To compare the recognition performance of different front-ends on a large vocabulary task, the Aurora-4 database and experiments have been set up. The so called “wall street journal” data base (as available from LDC under the abbreviation CSR-I (WSJ0)) is taken as basis for these experiments. As with Aurora-2, noise signals have been artificially added to the fairly clean data. There exist versions at sampling rates of 8 and 16 kHz. A set of recognition experiments has been defined. The practical realization of these experiments was done by the Institute for Signal and Information Processing at the Mississippi State University using their own freely available recognition system. Details about the data base and the experiments are available in a report.
Later on a HTK based implementation has been provided as an alternative recognizer.