Research group overview
Adaptive Speech Interfaces
Principal investigator: Fred Cummins
The Adaptive Speech Interface group explores new ways for humans to
interact with complex systems. We are exploring the use of multiple,
coordinated, parallel modalities, with an emphasis on
speech and language used together with more traditional tools such as
keyboards and mice. Humans communicate with each other using many
simultaneous modes, such as language, gesture, poise, etc. We believe
that communication with machines will be facilitated by allowing a
richer palette of interaction channnels operating simultaneously. We
do this already, when we grumble threateningly at the computer,
pointing to its error, saying "Don't do that!!! I wanted you to put
that there!!". Someday, the computer will apologize and then do the
UI on the fly
David Reitter, Erin Panttaja, Fred Cummins
We are developing techniques that allows a computer to automatically generate multimodal user interfaces, in particular for small computers such as cell phones or iPAQs. We enable these devices to engage in natural language conversation, using the touch-screen and voice out- and input at the same time. The output is tailored to the particular usage situation (in a restaurant, in the car, at home) as well as to the device and preferences of the user.
The Zoomable User Interface (ZUI) offers itself as one intuitive way to help deal with complexity of accessing and interacting with collections of information conveyed via user interfaces. This investigation is examining ways to enable the rapid creation of high-fidelity multimodal ZUIs and examines new ZUI interaction techniques that emerge from the investigation. Nutmeg is a powerful new framework and system that enables developers and researchers to easily create multimodal ZUIs via a custom developed markup language called ZIML. Programming and interface logic can simply be added to the ZUIs via a range of familiar scripting languages. Numerous media types are supported, from MP3s to JPEGs.
Wizard-of-Oz: A platform for corpus collection
Michael Cody, David Reitter, Fred Cummins
In order to see how people would use hypothetical applications which do not yet exist, we have built a Wizard-of-Oz application which permits simulation by an expert of the system's responses. The user remains blissfully unaware that the application is still the stuff of pipe dreams... In a data collection effort, we use the platform to gather empirical data from human subjects in three languages.
Analysis of Hyperspeech
Simone Ashby-Hanna, Fred Cummins
Different situations elicit different styles of speech. Hyperspeech describes a variety of accomodation techniques used to adapt to the speaking situation. We are conducting an experimental study of accomodation to situations such as talking to infants, non-native-speakers, machines, and speech produced in noise.
Jonah Brucker-Cohen, Mike Bennett
BumpList is a mailing list aiming to re-examine the culture and rules of online email lists. BumpList only allows for a maximum amount of subscribers so that when a new person subscribes, the first person to subscribe is "bumped", or unsubscribed from the list. Once subscribed, you can only be unsubscribed if someone else subscribes and "bumps" you off. The focus of the project is to determine if by attaching simple rules to communication mediums, the method and manner of correspondences that occur as well as behaviors of connection will change over time.
People are steadily moving more information into the digital realm, resulting in large personal collections of information that are often very awkward to search, navigate and interact with. Media Dive is a project that explores how Zooming User Interfaces fused with Sound Spaces can be utilised for exploring
large collections of digital media in a natural manner. Media Dive is built upon and runs within Nutmeg
More than words...
Elena Zvonik, Fred Cummins
Spoken communication contains both sounds and silences. The
silences are not incidental, but are intricately woven into the
fabric of communication. Often neglected in the study of speech and
language, we are now learning that all areas of language research
can benefit from a more formalised and structured understanding of
pause behaviour, from phonetics to grammar to psycholinguistics. We are conducting experiments into pause behaviour while reading.
Careful manipulation of speaking conditions has allowed us to
distinguish among different kinds of pause, and we are trying to
understand the function or functions of each.
Eva Maguire, David Reitter
Current dialogue-based user interfaces such as installation wizards or phone-based dialogue systems are full of boring, lengthy texts and prompts. We are investigating the use of referring expressions (this! that! he! it!) to make the interfaces appear more natural, easier and quicker to use. Our theoretical framework, Multimodal Centering, is based on a linguistic theory that can predict or explain the use of such pronouns. We extend it to multimodal human-computer dialogue.
Erin Panttaja, Fred Cummins
As we move toward truly adaptive multimodal designs, the complexity of the systems we build is increasing exponentially. We need a way to codify design criteria so that even as a system adapts it remains usable, and, in some sense, designed. In Permanent Design, I am creating an architecture for the generation of multimodal systems, taking into account the preferences and needs of a given user or situation.
Adaptive Speech Synthesis
Craig Olinsky, Fred Cummins
Creating a talking machine is laborious and expensive. Good speech synthesis systems only exist for the major languages of the world. We have sought to build machine learning techniques into synthesis systems so that a working system can adapt and learn to speak a related dialect or even language, based on examples from native speakers. In this way, speech technology for the plethora of languages in developing nations may become feasible.
Rhetorical Analysis with Support Vector Machines
How do we argue? How do we construct an essay? Most text displays an internal coherence structure, which can be analyzed as a tree structure of relations that hold between short segments of text. Our machine-learning governed approach to such an analysis in the framework of Rhetorical Structure Theory can determine rhetorical relations between spans of text.
Communication is the act of exchanging
symbols, which often represent highly abstract concepts. Usually there is
no direct mapping between symbols and concepts. What does this imply for
how we interact with devices across existing and potential modalities? How
can we translate from one symbol system to another symbol system? Symbolify is a project that plays with these questions. A user can enter a sentence
into Symbolify where it is processed and reconstituted as a collection of
semantically relevant images that are found in real-time on the internet,
i.e. a pictographic sentence.