Dante - Di Michelino 150° sponsors

Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package


Universitè de Avignon
Speech Cycle
Università di Firenze
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia


Comune di Firenze
Firenze Fiera
Florence Convention Bureau


12thAnnual Conference of the
International Speech Communication Association


Interspeech 2011 Florence

Special Events

Speaker State Challenge
Intoxication and Sleepiness

Wed-Ses1-S1 - oral I
Wed-Ses2-S1 - oral II

Web Site

The Challenge
While the first open comparative challenges in the field of paralinguistics targeted more "conventional" phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. Thus, the INTERSPEECH 2011 Speaker State Challenge broadens the scope by addressing two less researched speaker states while focusing on the crucial application domain of security and safety: the computational analysis of intoxication and sleepiness in speech. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance in high-risk environments such as driving, steering or controlling. The INTERSPEECH 2011’s theme “Speech science and technology for real life” is not only generally reflected in these every-day application scenarios, but also in particular by the conditions of the Challenge such as naturalistic paralinguistic phenomena and no pre-selection of instances.

For these Challenge tasks, the ALCOHOL LANGUAGE CORPUS (ALC) and the SLEEPY LANGUAGE CORPUS (SLC) with genuine intoxicated and sleepy speech will be provided by the organisers. The first consists of 39 hours of speech, stemming from 154 speakers in gender balance, and will serve to evaluate features and algorithms for the estimation of speaker intoxication in gradual blood alcohol percentage. The second features 10 hours of speech recordings of 50 subjects, annotated in 10 different levels of sleepiness. The verbal material consists of different complexity reaching from sustained vowel phonation to natural communication. The corpora further feature detailed speaker meta data, orthographic transcript, phonemic transcript, and segmentation and multiple annotation tracks. Both are given with distinct definitions of test, development, and training partitions, incorporating speaker independence as needed in most real-life settings. Benchmark results of the most popular approaches will be provided.

Two Sub-Challenges are addressed:

  • In the Intoxication Sub-Challenge, the degree of speakers' intoxication by alcohol consumption has to be determined by regression, covering blood alcohol concentration from 0 to 1.6 per mill. The measures of competition will thus be cross-correlation and mean linear error.
  • In the Sleepiness Sub-Challenge, the sleepiness of a speaker in an ordinal scale from 1 to 10 has to be determined by a suited regression algorithm and

Transcription of the train and development sets will be known. Contextual knowledge may be used, as the sequence of chunks will be given.
All Sub-Challenges allow contributors to find their own features with their own machine learning algorithm. However, a standard feature set will be provided per corpus that may be used. Participants will have to stick to the definition of training, development, and test sets. They may report on results obtained on the development set, but have only three trials to upload their results on the test sets, whose labels are unknown to them. Each participation will be accompanied by a paper presenting the results that undergoes peer-review and has to be accepted for the conference in order to participate in the Challenge.

The organisers preserve the right to re-evaluate the findings, but will not participate themselves in the Challenge.

Overall, contributions using the described databases are sought in (but not limited to) the following areas:

  • Participation in the Intoxication Sub-Challenge
  • Participation in the Sleepiness Sub-Challenge
  • Novel features and algorithms for the analysis of speaker state
  • Cross-corpus and cross-task feature genericity analysis
  • Exploitation of speaker trait meta-information in speaker state analysis
The results of the Challenge shall be presented in a Special Session at INTERSPEECH 2011 in Florence, Italy.

Literature on the Predecessor Events and the Corpora used for the Challenge

  • B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan: "The INTERSPEECH 2010 Paralinguistic Challenge", Proc. INTERSPEECH 2010, ISCA, Makuhari, Japan, pp. 2794-2797, 2010.
  • B. Schuller, S. Steidl, A. Batliner, F. Jurcicek: "The INTERSPEECH 2009 Emotion Challenge - Results and Lessons Learnt", Speech and Language Processing Technical Committee (SLTC) Newsletter, IEEE Signal Processing Society, October 2009. http://www.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2009-10/interspeech-emotion-challenge/
  • B. Schuller, S. Steidl, A. Batliner: "The INTERSPEECH 2009 Emotion Challenge", Proc. INTERSPEECH 2009, ISCA, Brighton, UK, pp. 312-315, 2009.
  • B. Schuller, A. Batliner, S. Steidl, D. Seppi: "Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge", to appear in Speech Communication, Special Issue on "Sensing Emotion and Affect – Facing Realism in Speech Processing", ELSEVIER, 2010.
  • F. Schiel, C. Heinrich, V. Neumeyer: "Rhythm and Formant Features for Automatic Alcohol Detection", Proc. INTERSPEECH 2010, ISCA, Makuhari, Japan, pp. 458-461, 2010.
  • F. Schiel, C. Heinrich: "Laying the Foundation for In-Car Alcohol Detection by Speech", Proc. INTERSPEECH 2009, ISCA, Brighton, UK, pp. 983-986, 2009.
  • J. Krajewski, A. Batliner, M. Golz: "Acoustic sleepiness detection – Framework and validation of a speech adapted pattern recognition approach", Behavior Research Methods, 41, pp. 795-804, 2009.
  • J. Krajewski, A. Batliner, R. Wieland: "Multiple classifier applied on predicting microsleep from speech", Proc. 19th Conference on Pattern Recognition (ICPR), Tampa/ Florida, IEEE Computer Society Press, no pagination, 2009.
  • J. Krajewski, B. Kröger: "Using prosodic and spectral characteristics for sleepiness detection", Proc. INTERSPEECH 2007, vol. 8, pp. 1841-1844, Antwerp, Belgium, 2007.


Björn Schuller - Senior Researcher and Lecturer, Technische Universität München Munich, Germany ( email: schuller@IEEE.org). He received his diploma in 1999 and his doctoral degree for his study on Automatic Speech and Emotion Recognition in 2006, both in electrical engineering and information technology from TUM (Munich University of Technology), one of Germany's repeatedly highest ranked and among first three Excellence Universities. He is tenured as Senior Researcher and Lecturer in Speech Processing heading the Intelligent Audio Analysis Group at TUM’s Institute for Human-Machine Communication since 2006. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France dealing with affective and social signals in speech. In 2010 he was also a visiting scientist in the Imperial College London's Department of Computing in London/UK working on audiovisual behaviour recognition. Best known are his works advancing Human-Machine-Interaction, Cognitive Systems, Audiovisual Processing, and Affective Computing. Dr. Schuller is a member of the ISCA, ACM, HUMAINE Association, and IEEE and authored more than 180 publications in peer reviewed books, journals, and conference proceedings in the field of signal processing, and machine learning leading to more than 1,000 citations - his current H-index equals 17. He serves as member of the steering committee and guest editor of the IEEE Transactions on Affective Computing, as guest editor for the Computer Speech and Language, Speech Communication and the EURASIP Journal on Advances in Signal Processing, reviewer for more than 20 further leading journals and several conferences in the field, and as invited speaker, session and challenge organizer including the first of their kind INTERSPEECH 2009 Emotion and INTERSPEECH 2010 Paralinguistic Challenges and chairman and programme committee member of numerous international workshops and conferences. Project steering board activity and involvement in current and past research projects include European, national and industry funded projects. Advisory board activities comprise his membership as invited expert in W3C Incubator Groups, and his repeated election into the Executive Committee of the HUMAINE Association where he chairs the Special Interest Group Speech.

Stefan Steidl - Senior Researcher - ICSI Berkeley, CA, USA ( email: steidl@icsi.berkeley.edu). He received his diploma degree in Computer Science in 2002 at the Friedrich-Alexander University of Erlangen-Nuremberg in Germany, where he also received his doctoral degree in 2008 for his work on Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech. He is currently a research scholar at the International Computer Science Institute (ICSI) at Berkeley, CA, U.S.A. His primary research interests are the automatic classification of naturally occurring emotion-related states of users in a human-machine interaction and the recognition of atypical speech (children's speech, speech of elderly people, pathological voices). He has (co-)authored more than 50 publications in journals and peer reviewed conference proceedings. Dr. Steidl has co-organized the special sessions `INTERSPEECH 2009 Emotion Challenge’ and INTERSPEECH 2010 Paralinguistic Challenge’ and was guest editor for special issues in the Computer Speech and Language and Speech Communication ISCA journals. He has served as reviewer for several journals and conferences in this area of research and has been a member of the Network-of-Excellence HUMAINE (Human Machine Interaction Network on Emotion) in the 7th framework programme of the European Community.

Anton Batliner - Senior Researcher - Friedrich-Alexander-University, CS Dept. 5, Erlangen-Nuremberg, Germany (email: batliner@informatik.uni-erlangen.de). He has been a member of the research staff of the Institute for Pattern Recognition since 1997. He is co-editor of one book and author/co-author of more than 200 technical articles, with a current H-index of 26 and more than 2500 citations. His present research interests are the modelling and automatic recognition of emotional user states, all aspects of prosody in speech processing, uni- and multi-modal focus of attention, pronunciation assessment, and spontaneous speech phenomena such as disfluencies, irregular phonation, etc. He has been dealing with the automatic classification of emotion in the national and European projects Verbmobil, SmartKom, and Pf-Star. He was one of the key contributors on speech analysis and emotion recognition from speech in the NoE HUMAINE, and the originator of the CEICES initiative. He served as Workshop/Session (co-) organizer for Emotional Corpora I, II, III (LREC), Paralinguistics (ICPhS 07), Non-prototypical Emotions (ACCI 09), Emotion Challenge (INTERSPEECH 09), Paralinguistic Challenge (INTERSPEECH 2010), Computer Aided Pronunciation Training (Prosody 2010); he was guest editor for AHCI, Computer Speech and Language, and Speech Communication, and is Associated Editor for the IEEE Transactions on Affective Computing. Reviewing activities comprise JASA, IEEE Transactions (div.), Speech Communication, Computer Speech and Language, Language and Speech, AHCI, JMUI, etc. as well as INTERSPEECH, ICASSP, ICPhS, ACL, ASRU, ICMI, ACCI, etc.

Florian Schiel - Senior Researcher / CEO - University of Munich / Bavarian Archive for Speech Signals Services, Munich, Germany (email: schiel@phonetik.uni-muenchen.de). He received his Dipl.-Ing. and Dr.-Ing. degrees from the Technical University in Munich in 1990 and 1993 respectively, both in electrical engineering. His doctoral thesis deals with automatic speaker adaptation in ASR. Since 1993 he was mainly affiliated to the Institute of Phonetics, Ludwig-Maximilians-Universität Munich (LMU), leading the German VERBMOBIL, SmartKom, BITS and SmartWeb project groups. In 1994 and 1997 he spent 6 months each as a research fellow at the International Computer Science Institut (ICSI), Berkeley, California. In 2001 Florian Schiel earned the German 'Habilitation' about the relation of speech technology and phonology at the philosophical faculty of the LMU and since then holds the chair of Phonetic Speech Processing. From 1996 to 2010 he acted as founding director of the Bavarian Archive for Speech Signals (BAS) at the LMU München. In 2005 he founded the spinoff BAS Services Schiel in Munich, Germany. Currently he is CEO for BAS Services and is tenured as a senior researcher at the new Institute of Phonetics and Speech Processing at LMU. In 2009 he initiated the 'Empirical Speech Processing' graduate school at LMU (together with C. Draxler). His present research interests include the analysis and modelling of user specific states based on large data sets, empirical speech analysis in general, speech corpus production and evaluation, speaker verification and forensic phonetics. He is the author/co-author of 4 monographs, 9 journal articles and book chapters, 23 peer-reviewed conference articles and 7 non-scientific books.

Jarek Krajewski - Professor - Bergische Universität Wuppertal, Wuppertal, Germany (email: krajewsk@uni-wuppertal.de). He received his diploma in 2004 and his doctoral degree for his study on Acoustic Sleepiness Detection in 2008, both in psychology and signal processing from Univ. Wuppertal and RWTH Aachen. He is Assistant Professor in Experimental Industrial Psychology since 2009 and vice director of the Center of Interdisciplinary Speech Science at the Univ. Wuppertal. Prof. Krajewski is member of the ISCA, Human Factors and Ergonomics Society, German Society of Psychology (Section Industrial Psychology, Section Traffic Psychology), and (co-)authored more than 50 publications in peer reviewed books, journals, and conference proceedings in the field of sleepiness detection, and signal processing. He serves as reviewer for more than 10 leading journals and several conferences in the field, and as invited speaker, session and
chairman and programme committee member of several international workshops and conferences. Project steering board activity and involvement in current and past research projects include, national and industry funded projects.


HUMAINE Association (www.emotion-research.net)
Bavarian Archive for Speech Signals (BAS)