Researchdirections

Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models

Aerial Diffusion: We present a novel method, Aerial Diffusion, for generating aerial views from a single ground-view image using text guidance. Aerial Diffusion leverages a pretrained text-image diffusion model for prior knowledge. We address two main challenges corresponding to domain gap between the ground-view and the aerial view and the two views being far apart in the text-image embedding manifold. Our approach uses a homography inspired by inverse perspective mapping prior to finetuning the pretrained diffusion model.

Aerial Recognition

Overview Image and video analysis of aerial scenes is crucial in a myriad of real life applications such as surveillance, search and rescue mapping, satellite imagery, etc. The GAMMA group is working towards artificial intelligence based solutions for problems related to aerial scenes analysis. Our research areas include aerial video activity recognition, memory efficient neural networks, synthetic data augmentation and transfer learning, synthetic data generation, and geo-localization from aerial point cloud.

Aerial Swarm Collision Avoidance

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality

Abstract We present Affect2MM, a learning method for time-series emotion prediction for multimedia content. Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors. We use the ideas from emotion causation theories to computationally model and determine the emotional state evoked in clips of movies. Affect2MM explicitly models the temporal causality using attention-based methods and Granger causality. We use a variety of components like facial features of actors involved, scene understanding, visual aesthetics, action/situation description, and movie script to obtain an affective-rich representation to understand and perceive the scene.

Affective Agents in AR/VR

Affective Computing

Overview We present novel algorithms for identifying emotion, dominance, and friendliness characteristics of pedestrians, as well detecting deceptive traits in walking, based on their motion behaviors. We also propose models for conveying emotions, friendliness, and dominance traits in virtual agents. We present applications of our algorithms to simulate interpersonal relationships between virtual characters, facilitate socially-aware robot navigation, identify perceived emotions from videos of walking individuals, and increase the sense of presence in scenarios involving multiple virtual agents.

Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos

Abstract We present a real-time algorithm to automatically classify the dynamic behavior or personality of a pedestrian based on his or her movements in a crowd video. We present a statistical scheme that dynamically learns the behavior of every pedestrian in a scene and computes that pedestrian’s motion model. This model is combined with global crowd characteristics to compute the movement patterns and motion dynamics, which can also be used to predict the crowd movement and behavior.

Applied Perception for Computer Graphics

Audio and Speech

AutoRVO: Reciprocal Collision Avoidance between Heterogeneous Agents and Applications to Autonomous Driving.

Abstract We present a novel algorithm for reciprocal collision avoidance between heterogeneous agents of different shapes and sizes. We present a novel CTMAT representation based on medial axis transform to compute a tight fitting bounding shape for each agent, which is less conservative and results in fewer false collisions. The overall runtime performance is comparable to prior multi-agent collision avoidance algorithms that use circular or elliptical agents. Based on CTMAT representation, we present a novel algorithm AutoRVO for computing collision-free navigation for heterogeneous road-agents such as cars, tricycles, bicycles, and pedestrians in dense traffic.