CU2C and CUMIX were developed at the DSP and Speech Technology Laboratory, Department of Electronic Engineering, CUHK.
CU2C is a dual-condition Cantonese speech database for speaker recognition research. It is a task-oriented database. The speech contents include Hong Kong ID numbers, Cantonese digit strings and sentences. CU2C is special in that it contains parallel data collected under different acoustic conditions, i.e. public fixed-line telephone channel and wideband desktop microphone. These data are useful for the study of channel effects in speaker recognition. A total of 84 target speakers and 23 impostors were recorded. Each speaker has 18 sessions of recordings, which were collected over 4 - 9 months.
More information about CU2C can be found from:
CUMIX is a database developed specifically for code-mixing speech recognition. The spoken contents in CUMIX are mainly daily conversation or jargons by university students in Hong Kong. There are three different types of utterances in CUMIX: (1) Cantonese-English code-mixing utterances, (2) Monolingual colloquial Cantonese utterances, and (3) Monolingual English words and phrases. It contains 16 hours of speech data from 74 speakers.
More information about CUMIX can be found from: