Lab Photo
MLE Logo
NOTE: Media Lab Europe closed in January 2005. The pages on this web site are for historical reference only.

   adaptive speech interfaces
   common sense
   everyday learning
   human connectedness
   liminal devices
   palpable machines
   story networks
   trusting technologies

   other research projects


Copyright © Media Lab Europe Limited.
All rights reserved.

Research group overview

Adaptive Speech Interfaces

Principal investigator: Fred Cummins

The Adaptive Speech Interface group explores new ways for humans to interact with complex systems. We are exploring the use of multiple, coordinated, parallel modalities, with an emphasis on speech and language used together with more traditional tools such as keyboards and mice. Humans communicate with each other using many simultaneous modes, such as language, gesture, poise, etc. We believe that communication with machines will be facilitated by allowing a richer palette of interaction channnels operating simultaneously. We do this already, when we grumble threateningly at the computer, pointing to its error, saying "Don't do that!!! I wanted you to put that there!!". Someday, the computer will apologize and then do the right thing!


UI on the fly
David Reitter, Erin Panttaja, Fred Cummins
We are developing techniques that allows a computer to automatically generate multimodal user interfaces, in particular for small computers such as cell phones or iPAQs. We enable these devices to engage in natural language conversation, using the touch-screen and voice out- and input at the same time. The output is tailored to the particular usage situation (in a restaurant, in the car, at home) as well as to the device and preferences of the user.

Mike Bennett
The Zoomable User Interface (ZUI) offers itself as one intuitive way to help deal with complexity of accessing and interacting with collections of information conveyed via user interfaces. This investigation is examining ways to enable the rapid creation of high-fidelity multimodal ZUIs and examines new ZUI interaction techniques that emerge from the investigation. Nutmeg is a powerful new framework and system that enables developers and researchers to easily create multimodal ZUIs via a custom developed markup language called ZIML. Programming and interface logic can simply be added to the ZUIs via a range of familiar scripting languages. Numerous media types are supported, from MP3s to JPEGs.

Wizard-of-Oz: A platform for corpus collection
Michael Cody, David Reitter, Fred Cummins
In order to see how people would use hypothetical applications which do not yet exist, we have built a Wizard-of-Oz application which permits simulation by an expert of the system's responses. The user remains blissfully unaware that the application is still the stuff of pipe dreams... In a data collection effort, we use the platform to gather empirical data from human subjects in three languages.

Analysis of Hyperspeech
Simone Ashby-Hanna, Fred Cummins
Different situations elicit different styles of speech. Hyperspeech describes a variety of accomodation techniques used to adapt to the speaking situation. We are conducting an experimental study of accomodation to situations such as talking to infants, non-native-speakers, machines, and speech produced in noise.

Jonah Brucker-Cohen, Mike Bennett
BumpList is a mailing list aiming to re-examine the culture and rules of online email lists. BumpList only allows for a maximum amount of subscribers so that when a new person subscribes, the first person to subscribe is "bumped", or unsubscribed from the list. Once subscribed, you can only be unsubscribed if someone else subscribes and "bumps" you off. The focus of the project is to determine if by attaching simple rules to communication mediums, the method and manner of correspondences that occur as well as behaviors of connection will change over time.

Media Dive
Mike Bennett
People are steadily moving more information into the digital realm, resulting in large personal collections of information that are often very awkward to search, navigate and interact with. Media Dive is a project that explores how Zooming User Interfaces fused with Sound Spaces can be utilised for exploring large collections of digital media in a natural manner. Media Dive is built upon and runs within Nutmeg

More than words...
Elena Zvonik, Fred Cummins
Spoken communication contains both sounds and silences. The silences are not incidental, but are intricately woven into the fabric of communication. Often neglected in the study of speech and language, we are now learning that all areas of language research can benefit from a more formalised and structured understanding of pause behaviour, from phonetics to grammar to psycholinguistics. We are conducting experiments into pause behaviour while reading. Careful manipulation of speaking conditions has allowed us to distinguish among different kinds of pause, and we are trying to understand the function or functions of each.

Multimodal Centering
Eva Maguire, David Reitter
Current dialogue-based user interfaces such as installation wizards or phone-based dialogue systems are full of boring, lengthy texts and prompts. We are investigating the use of referring expressions (this! that! he! it!) to make the interfaces appear more natural, easier and quicker to use. Our theoretical framework, Multimodal Centering, is based on a linguistic theory that can predict or explain the use of such pronouns. We extend it to multimodal human-computer dialogue.

Permanent Design
Erin Panttaja, Fred Cummins
As we move toward truly adaptive multimodal designs, the complexity of the systems we build is increasing exponentially. We need a way to codify design criteria so that even as a system adapts it remains usable, and, in some sense, designed. In Permanent Design, I am creating an architecture for the generation of multimodal systems, taking into account the preferences and needs of a given user or situation.

Completed projects

Adaptive Speech Synthesis
Craig Olinsky, Fred Cummins
Creating a talking machine is laborious and expensive. Good speech synthesis systems only exist for the major languages of the world. We have sought to build machine learning techniques into synthesis systems so that a working system can adapt and learn to speak a related dialect or even language, based on examples from native speakers. In this way, speech technology for the plethora of languages in developing nations may become feasible.

Rhetorical Analysis with Support Vector Machines
David Reitter
How do we argue? How do we construct an essay? Most text displays an internal coherence structure, which can be analyzed as a tree structure of relations that hold between short segments of text. Our machine-learning governed approach to such an analysis in the framework of Rhetorical Structure Theory can determine rhetorical relations between spans of text.

Mike Bennett
Communication is the act of exchanging symbols, which often represent highly abstract concepts. Usually there is no direct mapping between symbols and concepts. What does this imply for how we interact with devices across existing and potential modalities? How can we translate from one symbol system to another symbol system? Symbolify is a project that plays with these questions. A user can enter a sentence into Symbolify where it is processed and reconstituted as a collection of semantically relevant images that are found in real-time on the internet, i.e. a pictographic sentence.