Description We provide a dataset of dense and heterogeneous traffic videos. The dataset consists of the following road-agent categories – car, bus, truck, rickshaw, pedestrian, scooter, motorcycle, and other roadagents such as carts and animals. Overall, the dataset contains approximately 13 motorized vehicles, 5 pedestrians and 2 bicycles per frame, respectively. Annotations were performed following a strict protocol and each annotated video file consists of spatial coordinates in pixels, an agent ID, and an agent type.
– Paper Code Under Review
Overview
This project utilizes a VR vehicle simulator within Unity Game Engine to study human responses to various accidental scenarios in simulated traffic, contributing insights into driving dynamics and safety. By addressing the scarcity of adverse data for training self-driving cars, it aims to enhance their preparedness for unexpected events, employing low-cost immersive driving simulator hardware to facilitate widespread user studies in autonomous driving research.
Abstract We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world sub-band room equalization on the translated synthetic RIR.
Abstract We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or from motion-captured data and represented as sequences of 3D poses. Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in a bottom-up manner in the encoder, following the kinematic chains in the human body. We also constrain the latent embeddings of the encoder to contain the space of psychologically-motivated affective features underlying the gaits.
Abstract Environments for autonomous driving can vary from place to place, leading to challenges in designing a learning model for a new scene. Transfer learning can leverage knowledge from a learned domain to a new domain with limited data. In this work, we focus on end-to-end autonomous driving as the target task, consisting of both perception and control. We first utilize information bottleneck analysis to build a causal graph that defines our framework and the loss function; then we propose a novel domain-agnostic learning method for autonomous steering based on our analysis of training data, network architecture, and training paradigm.
Abstract We present TerraPN, a novel method to learn the surface characteristics (texture, bumpiness, deformability, etc.) of complex outdoor terrains for autonomous robot navigation. Our method predicts navigability cost maps for different surfaces using patches of RGB images, odometry, and IMU data. Our method dynamically varies the resolution of the output cost map based on the scene to improve its computational efficiency. We present a novel extension to the Dynamic-Window Approach (DWA-O) to account for a surface’s navigability cost while computing robot trajectories.
Abstract We present Text2Gestures, a transformer-based network that interactively generates emotive gestures for virtual agents corresponding to natural language text inputs. Our approach is designed to generate emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target virtual agents’ intended gender and handedness in our generation pipeline. We train and evaluate our network on the MPI Emotional Body Expressions Database and observe that our network produces state-of-the-art performance in generating gestures for virtual agents aligned with the text for narration or conversation.
Abstract We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our approach estimates time-varying emotional behaviors of pedestrians from their faces and trajectories using a combination of Bayesian-inference, CNN-based learning, and the PAD (Pleasure-Arousal-Dominance) model from psychology. These PAD characteristics are used for long-term path prediction and generating proxemic constraints for each pedestrian. We use a multi-channel model to classify pedestrian characteristics into four emotion categories (happy, sad, angry, neutral).
Abstract We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our approach estimates time-varying emotional behaviors of pedestrians from their faces and trajectories using a combination of Bayesian-inference, CNN-based learning, and the PAD (Pleasure-Arousal-Dominance) model from psychology. These PAD characteristics are used for long-term path prediction and generating proxemic constraints for each pedestrian. We use a multi-channel model to classify pedestrian characteristics into four emotion categories (happy, sad, angry, neutral).
– Paper Code Under Review Github (TBD)
Authors: Laura Zheng, Julio Poveda, James Mullen, Shreelekha Revankar, Ming Lin
Abstract: Autonomous driving research currently faces data sparsity in representation of risky scenarios. Such data is both difficult to obtain ethically in the real world, and unreliable to obtain via simulation. Recent advances in virtual reality (VR) driving simulators lower barriers to tackling this problem in simulation.