Multimodal Dyadic Behavior Dataset


We introduce the Multimodal Dyadic Behavior (MMDB) dataset, a unique collection of multimodal (video, audio, and physiological) recordings of the social and communicative behavior of toddlers. The MMDB contains 160 sessions of 3-5 minute semi-structured play interaction between a trained adult examiner and a child between the age of 15 and 30 months. Our play protocol is designed to elicit social attention, back-and-forth interaction, and non-verbal communication from the child. These behaviors reflect key socio-communicative milestones which are implicated in autism spectrum disorders. The MMDB dataset supports a novel problem domain for activity recognition, which consists of the decoding of dyadic social interactions between adults and children in a developmental context.


Our overall goal is to facilitate the development of novel computational methods for measuring and analysing the behavior of children and adults during face-to-face social interactions. We have explored the automatic analysis of three aspects of the dataset:

  • - Parsing into stages and substages
  • - Detection of discrete behaviors (gaze shifts, smiling, and play gestures)
  • - Prediction of engagement ratings at the stage and session level


We have collected 160 sessions of 5-minute interaction from 121 children. All multimodal signals are synchronized, including:

  • - 2 frontal view Basler cameras (1920x1080 at 60 FPS)
  • - An overhead view Kinect (RGB-D) camera
  • - 8 side view & 3 overhead view AXIS cameras (640x480 at 30 FPS)
  • - An omnidirectional and a cardioid microphone, ceiling mounted
  • - 2 wireless lapel microphones, worn by both the child and the adult
  • - 4 Affectiva Q-sensors for electrodermal activity and accelerometry, worn by both the adult and the child


The MMDB dataset contains fine-grained annotations of behaviors, including

  • - Ratings of engagement and responsiveness at substage level
  • - Frame-level, continuous annotation of relevant child behaviors (attention shifts, facial expressions, gestures and vocalizations)


James M. Rehg , Gregory D. Abowd , Agata Rozga , Mario Romero , Mark A. Clements , Stan Sclaroff , Irfan Essa , Opal Y. Ousley , Yin Li , Chanho Kim , Hrishikesh Rao , Jonathan C. Kim , Liliana Lo Presti , Jianming Zhang , Denis Lantsman , Jonathan Bidwell , Zhefan Ye. Decoding Children's Social Behavior (2013). In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), on, pages 3414 - 3421, Portland, Oregon, June, 2013.

Paper | Presentation (slides | poster | video) | BibTex | DOI: 10.1109/CVPR.2013.438



Multimodal Dyadic Behavior Dataset (MMDB)


To be released


Portions of this work were supported in part by NSF Expedition Award number 1029679. Author Ousley acknowledges the support of Emtech Biotechnology Development, Inc.


The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without explicit permission of the copyright holder.