Researchdirections

CoMet: Modeling Group Cohesion for Socially Compliant Robot Navigation in Crowded Scenes

– Abstract We present CoMet, a novel approach for computing a group’s cohesion and using that to improve a robot’s navigation in crowded scenes. Our approach uses a novel cohesion-metric that builds on prior work in social psychology. We compute this metric by utilizing various visual features of pedestrians from an RGB-D camera on-board a robot. Specifically, we detect characteristics corresponding to proximity between people, their relative walking speeds, the group size, and interactions between group members.

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

Abstract We present CoNVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot’s surroundings, and formulate context-based navigation behaviors as simple text prompts (e.g. “stay on the pavement”). Second, we utilize their state-of-the-art semantic understanding and logical reasoning capabilities to compute a suitable trajectory given the identified context.

Contextualized Styling of Images for Web Interfaces using Reinforcement Learning

Pooja Guhan1, Saayan Mitra2, Somdeb Sarkhel2, Stefano Petrangeli2, Ritwik Sinha2, Viswanathan Swaminathan2, Aniket Bera3, Dinesh Manocha1 1University of Maryland College Park, 2Adobe Research San Jose, 3 Purdue University West Lafayette Abstract Content personalization is one of the foundations of today’s digital marketing. Often the same image needs to be adapted for different design schemes for content that is created for different occasions, geographic locations or other aspects of the target population.

CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

CrossLoc3D: We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features.

Crowd and Multi-Agent Environments

– Overview Crowd and multi-agent simulation is the process of simulating large numbers of people, creatures, or other characters, each interacting in one environment. These actors are expected to move to their goals, interact with their environment, and respond to each other. Crowd simulations have many uses, including improving architectural planning, enhancing training environments and virtual realities, and driving artificially-intelligent (AI) characters in games and movies.

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

Most existing traffic video datasets e.g. the Waymo Open Motion Dataset, are collected in Western countries and consist of simple, structured, and predictable traffic. Most Asian scenarios, however, are far denser, more unstructured, and more heterogeneous, with many road-agents routinely disobeying common traffic rules. Consequently, state-of-the-art computer vision and autonomous driving perception algorithms trained on existing datasets do not transfer to traffic in Asian countries. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.

DIFFAR: Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

We present a differentiable frequency-based method for aerial video recognition. Our differentiable static-dynamic frequency mask provides a prior for disentangled regions relevant to action recognition. This mask is used to guide the learning of disentangled features within the layers of the neural network using an identity function. Further, we propose a frame sampling strategy that chooses the best frame within each uniform video segment, at test time, using the static-dynamic frequency mask and temporal difference.

DISC: Dataset for Analyzing Driving Styles In Simulated Crashes for Mixed Autonomy

.grid-container { display: grid; /* grid-template-columns: repeat(3, 1fr); grid-template-rows: repeat(2, 1fr); */ gap: 10px; margin: 0; padding: 0; } .grid-item { width: 100%; height: 100%; object-fit: cover; margin: 0; padding: 0; }
Overview DISC (Driving Styles In Simulated Crashes) is a pioneering dataset capturing diverse human driving behaviors in pre-crash scenarios within mixed autonomy settings. Collected via the TRAVERSE VR-based simulator, DISC includes data from hundreds of drivers facing rare-event traffic scenarios in a virtual city.

DS-MPEPC: Safe and Deadlock-Avoiding Robot Navigation in Cluttered Dynamic Scenes

Abstract We present an algorithm for safe robot navigation in complex dynamic environments using a variant of model predictive equilibrium point control. We use an optimization formulation to navigate robots gracefully in dynamic environments by optimizing over a trajectory cost function at each timestep. We present a novel trajectory cost formulation that significantly reduces conservative and deadlocking behaviors and generates smooth trajectories. In particular, we propose a new collision probability function that effectively captures the risk associated with a given configuration and the time to avoid collisions based on the velocity direction.

DWA-RL: Dynamically Feasible Deep Reinforcement Learning Policy for Robot Navigation among Mobile Obstacles

Abstract We present a novel Deep Reinforcement Learning (DRL) based policy to compute dynamically feasible and spatially aware velocities for a robot navigating among mobile obstacles. Our approach combines the benefits of the Dynamic Window Approach (DWA) in terms of satisfying the robot’s dynamics constraints with state-of-the-art DRL-based navigation methods that can handle moving obstacles and pedestrians well. Our formulation achieves these goals by embedding the environmental obstacles’ motions in a novel low-dimensional observation space.