Researchdirections

Sociosense: Robot navigation amongst pedestrians with social and psychological constraints

Abstract We present a real-time algorithm, SocioSense, for socially-aware navigation of a robot amongst pedestrians. Our approach computes time-varying behaviors of each pedestrian using Bayesian learning and Personality Trait theory. These psychological characteristics are used for long-term path prediction and generating proxemic characteristics for each pedestrian. We combine these psychological constraints with social constraints to perform human-aware robot navigation in low- to medium-density crowds. The estimation of time-varying behaviors and pedestrian personalities can improve the performance of long-term path prediction by 21%, as compared to prior interactive path prediction algorithms.

Sociosense: Robot navigation amongst pedestrians with social and psychological constraints

Abstract We present a real-time algorithm, SocioSense, for socially-aware navigation of a robot amongst pedestrians. Our approach computes time-varying behaviors of each pedestrian using Bayesian learning and Personality Trait theory. These psychological characteristics are used for long-term path prediction and generating proxemic characteristics for each pedestrian. We combine these psychological constraints with social constraints to perform human-aware robot navigation in low- to medium-density crowds. The estimation of time-varying behaviors and pedestrian personalities can improve the performance of long-term path prediction by 21%, as compared to prior interactive path prediction algorithms.

Sound Synthesis and Propagation

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

Abstract We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of a generator, which synthesizes gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator, which distinguishes between the synthesized pose sequences and real 3D pose sequences. We leverage the mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective cues.

Synthetic Data for Data Efficient Speech, Audio and Natural Language Processing

TAME-RD: Text Assisted Replication of Image Multi-Adjustments for Reverse Designing

Pooja Guhan1, Uttaran Bhattacharya2, Somdeb Sarkhel2, Vahid Azizi2, Xiang Chen2, Saayan Mitra2, Aniket Bera3, Dinesh Manocha1 1University of Maryland College Park, 2Adobe Research San Jose, 3 Purdue University West Lafayette Abstract Given a source and its edited version performed based on human instructions in natural language, how do we extract the underlying edit operations, to automatically replicate similar edits on other images? This is the problem of reverse designing, and we present TAME-RD, a model to solve this problem.

TERP: Reliable Planning in Uneven Outdoor Environments using Deep Reinforcement Learning

Abstract We present a novel method for reliable robot navigation in uneven outdoor terrains. Our approach employs a novel fully-trained Deep Reinforcement Learning (DRL) network that uses elevation maps of the environment, robot pose, and goal as inputs to compute an attention mask of the environment. The attention mask is used to identify reduced stability regions in the elevation map and is computed using channel and spatial attention modules and a novel reward function.

TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes

In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception. Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm stores a set of tiered feature vectors. The tiered feature vectors are generated to effectively model conceptual information about a scene as well as an image decoder that transforms output feature maps into RGB images.

TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators

TOPGN: Real-time Transparent Obstacle Detection using Lidar Point Cloud Intensity for Autonomous Robot Navigation

Abstract We present TOPGN, a novel method for real-time transparent obstacle detection for robot navigation in unknown environments. We use a multi-layer 2D grid map representation obtained by summing the intensities of lidar point clouds that lie in multiple non-overlapping height intervals. We isolate a neighborhood of points reflected from transparent obstacles by comparing the intensities in the different 2D grid map layers. Using the neighborhood, we linearly extrapolate the transparent obstacle by computing a tangential line segment and use it to perform safe, real-time collision avoidance.