Researchdirections

MIM: Indoor and Outdoor Navigation in Complex Environments Using Multi-Layer Intensity Maps

Abstract We present MIM (Multi-Layer Intensity Map), a novel 3D object representation for robot perception and autonomous navigation. MIMs consist of multiple stacked layers of 2D grid maps each derived from reflected point cloud intensities corresponding to a certain height interval. The different layers of MIMs can be used to simultaneously estimate obstacles’ height, solidity/density, and opacity. We demonstrate that MIMs’ can help accurately differentiate obstacles that are safe to navigate through (e.

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

MITFAS: We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion. We also propose a novel frame sampling method that uses joint mutual information to acquire the most informative frame sequence in UAV videos.

MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding

Abstract We present MOSU, a novel autonomous long-range navigation system that enhances global navigation for mobile robots through multimodal perception and on-road scene understanding. MOSU addresses the outdoor robot navigation challenge by integrating geometric, semantic, and contextual information to ensure comprehensive scene understanding. The system combines GPS and QGIS map-based routing for high-level global path planning and multi-modal trajectory generation for local navigation refinement. For trajectory generation, MOSU leverages multi-modalities: LiDAR-based geometric data for precise obstacle avoidance, image-based semantic segmentation for traversability assessment, and Vision-Language Models (VLMs) to capture social context and enable the robot to adhere to social norms in complex environments.

MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

Abstract We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments, while ensuring comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage.

Machine Learning with Physics

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Abstract Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, we propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.

MetaVerse and AR/VR

Modeling Data-Driven Dominance Traits for Virtual Characters using Gait Analysis

Abstract We present a data-driven algorithm for generating gaits of virtual characters with varying dominance traits. Our formulation utilizes a user study to establish a data-driven dominance mapping between gaits and dominance labels. We use our dominance mapping to generate walking gaits for virtual characters that exhibit a variety of dominance traits while interacting with the user. Furthermore, we extract gait features based on known criteria in visual perception and psychology literature that can be used to identify the dominance levels of any walking gait.