The MINT - MultimodalINTeraction Framework

MINT 2012 Framework

The MINT 2012 architecture and components reflect that state of the implementation of the MINT framework in the year 2012. All articles published in this years refer to this version that has been published as open source.

Ecosystem

Click on one of the component boxes to learn more about the actual state of implementation.

Streaming Example

Streaming_Example

The streaming example is the minimal hello world application to help developers to get started. It shows how the application models (AUI,CUI), the interaction resources (a mouse) and the mappings are created at code level. Further on, it demonstrates by a button how the functional core can be added to an interactive application.

The streaming example is part of the MINT-platform distribution.

Turn the Sheets with Your Head

music-sheet

During plays, music sheets are used as a guide to perform a musical piece. However, since songs may span across several sheets, an extra amount of coordination is necessary to turn pages without disrupting the play. Therefore musicians are required to learn the music by hearth.

This prototype allows controlling an interface just by using head movements and has been implemented using a model-based interface design. It supports a musician to turn music sheets just by head movements.

The source code is available as part of the MINT-platform distribution.

Gesture processing with the Kinect

kinect-gestures

We are currently working on an improved gesture recognition based on the Microsoft Kinect controller.

As soon as we have first results published you will find the source code published on github.

Stay tuned…

We are continuously evolving the framework and intend to add further examples and prototypes that demonstrate the capabilities of the MINT platforms soon….

Redis Data Structure Server

redis-logo Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

We are currently using the Subscribe/Notify Mechanisms of Redis as the communication channel to synchronize the different modes and media of a multimodal application. The MINT platform requires version 2.2.4. (Direct Download

For further information about Redis please check the official redis homepage .

Node.js version 0.8

Node.js is an event-driven I/O server-side JavaScript environment based on V8. It is intended for writing scalable network programs such as web servers. It was created by Ryan Dahl in 2009

Please see the official website for further information: http://nodejs.org

MINT-platform

The MINT platform implements node.js based webserver application to execute multimodal web applications written in MINT.

Please see the installation instructions on github site that describe in detail how to install and setup the server.

The source code for the MINT-platfrom is available at GitHub: here

MINT-core

The MINT core gem contains all basic AUI and CUI models as well as the basic infrastructure to create interactors and mappings. Please note that you need the MINT-platform to be able to actually run a system.

There is still no documentation for the framework, but a lot of articles about the concepts and theories of our approach have already been published and can be accessed from our project site .

Detailed installation instructions and source code are available at GitHub: here

MINT-Monitor

debugger

The Multimodal Interaction Framwork monitor is now part of the MINT-platform and has been re-written as a web application.

The source code is available at GitHub: here

SocketStream 0.3

A fast, modular Node.js web framework dedicated to building single-page realtime apps.

SocketStream is a new breed of web framework that uses websockets to push data to the browser in ‘realtime’. It makes it easy to create blazing-fast, rich interfaces which behave more like desktop apps than traditional web apps of the past.

The official website can be found here

The MINT platform is based on the SocketStream web framework to ease the implementation of web applications that can be synchronized at realime with the control modes.

LuminAR

LuminAR The LuminAR desk lamp reimplements a traditional desk lamp. It combines a Pico-projector with a Kinect mounted on a robot-arm to support moveable and highly interactive projections. The hardware is based on an initial idea published by the Fluid Interface Group at MIT here

We are currently modelling several applications for the LuminAR based on the MINT platform.

Cassowary Ruby Interface and Ubuntu 10.04 LTS Debian packages

cassowary-logo Cassowary is an incremental constraint solving toolkit that efficiently solves systems of linear equalities and inequalities. Constraints may be either requirements or preferences. Client code specifies the constraints to be maintained and the solver updates the constrained variables to have values that satisfy the constraints. The cassowary constraint solver is copyrighted by Greg J. Badros and Alan Borning and has been released under GPL.

The official website can be found here

We implemented a Ruby gem that makes the cassowary C interface usable in Ruby. Further on, we packaged the native c library for Ubuntu 10.04 and 12.04 LTS.

Installation instructions can be found here: http://github.com/sfeu/cassowary .

Datamapper Redis Adapter

datamapper-logo DataMapper is a Object Relational Mapper written in Ruby, originally developed by Dan Kubb.

We use the Redis Datamapper adapter that enables to store data in a Redis database. We extended the datamapper adapter to meet our requirements.

The official site is http://datamapper.org

Our extension to the official Redis Adapter has been published at http://github.com/sfeu/dm-redis-adapter .

Ruby Language version 1.9.1

Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. Several popular we frameworks such as Ruby on Rails have been implemented with ruby. Most of our software is based on ruby 1.9. We do not support older versions of ruby.
Ruby version 1.9 is bundled with Ubuntu and can be installed out of the box.

See the official Ruby homepage for further information.

Ubuntu 12.04 LTS

Ubuntu is a fast, secure and easy-to-use operating system used by millions of people around the world. Most parts of our software are based on Linux and the binary versions that we offer are prepared to run on the Long-term-supported (LTS) Ubuntu Linux 12.04.x for 64bit and 32bit.

Please note that we do not support newer versions of Ubuntu since the 12.04 version figured out to run very stable compared to more recent versions. The LTS version will be supported from Ubuntu at least till 2015 .

You can download the Ubuntu 12.04.x installation CD from the official Ubuntu release download page" .

Further information about Ubuntu can be found on the official Ubuntu homepage

Abstract Interactor Model (AIM)

The Abstract Interactor Model (AIM) describes the media and mode independent part of the user interface interactors and therefore the overall interaction controls and views. At a concrete, mode or media specific interactor model the AIM interactors are detailed to include specific features. These features reflect the individual characteristics of a certain mode or media and enable enhanced accessibility and-or comfort in control or representation of the interface. The concrete interator model features are not relevant for the functional backend whereas the AIM interactors are limited to include all relevant data relevant for the functional backend.

The current AIM specification can be found here .

Concrete Interactor Model (CIM)

For each media or mode a Concrete Interactor Model (AIM) is used to describe its specific interaction capabilities beyond the basic ones that already are defined in the AIM. To each AIM interactor one or more concrete interactors can be associated for a specific mode or media. We are currently focusing on describing the concrete graphical HTML interface model.

The graphical HTML concrete interactor model will be published soon.

Interaction Resources (IR)

Interaction Resources are used to capture the capabilities of concrete devices implementing a certain mode to the user interfaces

The current Interaction Resource Model(IRM) specification can be found here .

Multimodal Interactor Mapping Model (MIM)

dragdrop-mapping

Multimodal mappings are the glue to combine the AUI, CUI and interaction resource interactios. Thus, they are used to describe an interaction (step) that can involve several modes and media in a multimodal manner. Further on the mappings can be specified on an abstract level. In this case overall interaction paradigms like e.g. a drag-and-drop can be specified.

The MIM specification can be found here .

State Chart XML (SCXML) Editor – scxmlgui

scxmlgui

Interactors can be designed and manipulated with any SCXML editor. We are using the freely available scxmlgui editor .

The project page of the scxmlgui editor can be found here .

Google Chrome Browser

chrome-logo

So far, all multimodal interfaces with the MINT framework are offered as web applications. Since we rely on some of the latest HTML standards such as HTML5, CSS3 and WebSockets we recommend using the latest Google Chrome Browser that can be found here .

Developer-centric approach

Designing and implementing multimodal interfaces for the web is still a complex task and so far the targeted audience of our approach are developers that are familiar with current web technologies like HTML5, CSS3, Javascript, the ruby programming language as well as start chart modelling. Multimodal interfaces are still subject of basic research and currently are only used in specific niches, such as in-car navigation systems or airplane cockpits for instance.

Even though we hope that in the future multimodal interfaces could be created by people without comprehensive programming skills, such as interaction designers, we are still focusing our approach on discovering the opportunities for multimodal web interfaces and follow a very technical approach.

The User

We are publishing all our example applications and multimodal control modes that we implement as part of our research projects as open source for others to experiment with our results. You are invited to download our example applications, such as the glove-based gesture recognition or the interactive music sheet that can be controlled by head movements to change and extend them as you like.

We are very interested in your feedback! But please remember that we are not focus on creating products. Instead we are doing basic research about multimodal interfaces and just implement prototypes that should demonstrate specific features. Since we are few people we can’t effort any further work that is required to use these applications on a daily basis. But if your are interested in extending our using our work you are very welcome to do so. We will try to give you all support you need.

Gloves-based gesture and posture recognition

postures

Based on an earlier project of finger spelling recognition of sign language we implemented a system that is able to recognize gestures using coloured gloves by doing segmentation based on the HSV colour space. We use images of 25×25 pixels and trained an artificial neural network Multi-Layer Perceptron (MLP) with an architecture of 625×100×4 neurons in each layer. This network is able to classify five different postures that we used in some of our research papers to control web applications.

The Glove-based gesture recognition is available on request for Windows. Since we do not have the time to clean up the source and had some problems running the software on newer windows versions
we are not willing to publish it on github. But if you are interested in using or improving it, please send us an email and we will send you the recognizer that can be used e.g. together with the MINT-MoBe2011 project.

Head-movement recognition

head-tracker

We are currently working on a face recognition and head tracking component. We use the Viola-Jones object detection framework for the recognition and the Camshift object tracking algorithm for tracking the head since we required a tracking at constant time and can be applied to track heads of arbitrary persons without the need for training.

A binary version of the head tracker is available for download.

Accord.NET

The Accord.NET Framework is a C# framework extending the excellent AForge.NET Framework with new tools and libraries. Accord.NET provides many algorithms for many topics in mathematics, statistics, machine learning, artificial intelligence and computer vision. It includes several methods for statistical analysis, such as Principal Component Analysis, Linear Discriminant Analysis, Partial Least Squares, Kernel Principal Component Analysis, Kernel Discriminant Analysis, Logistic and Linear Regressions and Receiver-Operating Curves. It also includes machine learning topics such as (Kernel) Support Vector Machines, Bayesian regularization for Neural Network training, RANSAC, K-Means, Gaussian Mixture Models and Discrete and Continuous Hidden Markov Models. The imaging and computer vision libraries includes projective image blending, homography estimation, the Camshift object tracker and the Viola-Jones object detector.

The Accord.NET project homepage can be found here: http://accord-net.origo.ethz.ch/ .

Microsoft Windows

Unfortunately the head tracker requires Microsoft Windows.

Gesture and posture recognition component

kinect-gestures

We are currently working on a new version of our gesture and posture recognition software that will work under Linux and supports both, gloves as well as a recognition without
colored gloves driven by the Kinect controller.

Ubuntu 102.04 LTS

You can download the Ubuntu 12.04.x installation CD from the official Ubuntu release download page" .

Further information about Ubuntu can be found on the official Ubuntu homepage

Mapping Editor

MappingEditor With our mapping editor, multimodal mappings are specified by a custom flow chart like notation that offers three basic elements: observations, actions and operators. Boxes with rounded edges describe “observations” of state changes. Boxes with sharp edges are used to define actions, which are backend function calls or the

triggering of events. A set of observations is connected by an operator that describes a relation between different modes.
We have implemented a first prototype in Java and are currently re-implementing it as a web application. As soon as the new version is finished, it will be published as part of the MINT-platform.

MINT 2012 Framework

Ecosystem

Streaming Example

Turn the Sheets with Your Head

Gesture processing with the Kinect

Stay tuned…

Redis Data Structure Server

Node.js version 0.8

MINT-platform

MINT-core

MINT-Monitor

SocketStream 0.3

LuminAR

Cassowary Ruby Interface and Ubuntu 10.04 LTS Debian packages

Datamapper Redis Adapter

Ruby Language version 1.9.1

Ubuntu 12.04 LTS

Abstract Interactor Model (AIM)

Concrete Interactor Model (CIM)

Interaction Resources (IR)

Multimodal Interactor Mapping Model (MIM)

State Chart XML (SCXML) Editor – scxmlgui

Google Chrome Browser

Developer-centric approach

The User

Gloves-based gesture and posture recognition

Head-movement recognition

Accord.NET

Microsoft Windows

Gesture and posture recognition component

Ubuntu 102.04 LTS

Mapping Editor

Contents

Partner

Follow Us