Researchdirections

Differentiable Fluids with Solid Coupling for Learning and Control

Abstract We introduce an efficient differentiable fluid simulator that can be integrated with deep neural networks as a part of layers for learning dynamics and solving control problems. It offers the capability to handle one-way coupling of fluids with rigid objects using a variational principle that naturally enforces necessary boundary conditions at the fluid-solid interface with sub-grid details. This simulator utilizes the adjoint method to efficiently compute the gradient for multiple time steps of fluid simulation with user defined objective functions.

Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data For Reading Task Identification

– Abstract We present a new approach, that we call AdaGTCN, for identifying human reader intent from Electroencephalogram (EEG) and Eye movement (EM) data in order to help differentiate between normal reading and task-oriented reading. Understanding the physiological aspects of the reading process (the cognitive load and the reading intent) can help improve the quality of crowd-sourced annotated data. Our method, Adaptive Graph Temporal Convolution Network (AdaGTCN), uses an Adaptive Graph Learning Layer and Deep Neighborhood Graph Convolution Layer for identifying the reading activities using time-locked EEG sequences recorded during word-level eye-movement fixations.

ENI: Quantifying Environment Compatibility for Natural Walking in Virtual Reality

Abstract We present a novel metric to analyze the similarity between the physical environment and the virtual environment for natural walking in virtual reality. Our approach is general and can be applied to any pair of physical and virtual environments. We use geometric techniques based on conforming constrained Delaunay triangulations and visibility polygons to compute the Environment Navigation Incompatibility (ENI) metric that can be used to measure the complexity of performing simultaneous navigation.

ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera

Abstract We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monoc- ular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously pro- viding uncertainty estimates for semantic predictions. By designing a triplane-based deformable attention mechanism, our approach improves geometric understanding of the scene than other SOTA approaches and reduces noise in semantic predictions. Additionally, through the use of a Conditional Variational AutoEncoder (CVAE), we estimate the uncer- tainties of these predictions.

EVA: Generating Emotional Behavior of Virtual Agents using Expressive Features of Gait and Gaze

Abstract We present a novel, real-time algorithm, EVA, for generating virtual agents with various perceived emotions. Our approach is based on using Expressive Features of gaze and gait to convey emotions corresponding to happy, sad, angry, or neutral. We precompute a data-driven mapping between gaits and their perceived emotions. EVA uses this gait emotion association at runtime to generate appropriate walking styles in terms of gaits and gaze. Using the EVA algorithm, we can simulate gaits and gazing behaviors of hundreds of virtual agents in real-time with known emotional characteristics.

Embodied AI

– Overview Embodied AI broadly concerns the physical “embodiement” of artificial intelligence. More often than not, this involes an element of the agent interacting with its surroundings to gather knowledge to perform a particular task. These tasks could involve navigation, where a physical robot agent is expected to find targets in the environment in the form of images (ImageNav), objects (ObjectNav), portable objects (Portable ObjectNav), points (PointNav), follow language instructions (Vision-and-Langauge Navigation or VLN), etc.

EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle

Abstract We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. Motivated by Frege’s Context Principle from psychology, our approach combines three interpretations of context for emotion recognition. Our first interpretation is based on using multiple modalities (e.g. faces and gaits) for emotion recognition. For the second interpretation, we gather semantic context from the input image and use a self-attention-based CNN to encode this information.

Emotion Recognition

Overview This area focused primarily on developing techniques for emotion recognition from multiple modalities such as face, speech, and body expressions. This leads to a variety of applications including fake media detection, understanding cognitive engagement, and building socially-aware robots for navigation and interaction with humans. Publications Project Conference/Journal Year INTENT-O-METER: Determining Perceived Human Intent in Multimodal Social Media Posts using Theory of Reasoned Action under review 2023 Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis WACV-W 2023 Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention ACMMM 2022 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos CVPR 2022 Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models Odyssey 2022 Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders AAAI 2022 DeepTMH: Multimodal Semi-Supervised Framework Leveraging Affective and Cognitive Engagement for Telemental Health arXiv 2021 HighlightMe: Detecting Highlights from Human-Centric Videos ICCV 2021 Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality CVPR 2021 Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data For Reading Task Identification ICASSP 2021 Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues ACM Multimedia 2020 Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping ECCV 2020 EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege’s Principle CVPR 2020 M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues AAAI 2020 STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits AAAI 2020 EVA: Generating Emotional Behavior of Virtual Agents using Expressive Features of Gait and Gaze ACM SAP 2019 Identifying Emotions from Walking using Affective and Deep Features arXiv 2019 The Emotionally Intelligent Robot: Improving Social Navigation in Crowded Environments IROS 2019 Data-Driven Modeling of Group Entitativity in Virtual Environments VRST 2018 Classifying Group Emotions for Socially-Aware Autonomous Vehicle Navigation CVPR Workshop 2018 Aggressive, Tense or Shy?

Emotions Don't Lie: A Deepfake Detection Method using Audio-Visual Affective Cues

Abstract We present a learning-based multimodal method for detecting real and deepfake videos. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to emotion from the two modalities within a video to infer whether the input video is “real” or “fake”. We propose a deep learning network, inspired by the Siamese network architecture and the triplet loss.

Enhanced Transfer Learning for Autonomous Driving with Systematic Accident Simulation

Paper V Model V->R Model IROS 2020 Download Download
Authors: Shivam Akhauri, Laura Zheng, Ming Lin Abstract: Simulation data can be utilized to extend real-world driving data in order to cover edge cases, such as vehicle accidents. The importance of handling edge cases can be observed in the high societal costs in handling car accidents, as well as potential dangers to human drivers.