– Abstract We present CoMet, a novel approach for computing a group’s cohesion and using that to improve a robot’s navigation in crowded scenes. Our approach uses a novel cohesion-metric that builds on prior work in social psychology. We compute this metric by utilizing various visual features of pedestrians from an RGB-D camera on-board a robot. Specifically, we detect characteristics corresponding to proximity between people, their relative walking speeds, the group size, and interactions between group members.
Abstract We present CoNVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot’s surroundings, and formulate context-based navigation behaviors as simple text prompts (e.g. “stay on the pavement”). Second, we utilize their state-of-the-art semantic understanding and logical reasoning capabilities to compute a suitable trajectory given the identified context.
Pooja Guhan1, Saayan Mitra2, Somdeb Sarkhel2, Stefano Petrangeli2, Ritwik Sinha2, Viswanathan Swaminathan2, Aniket Bera3, Dinesh Manocha1
1University of Maryland College Park, 2Adobe Research San Jose, 3 Purdue University West Lafayette
Abstract Content personalization is one of the foundations of today’s digital marketing. Often the same image needs to be adapted for different design schemes for content that is created for different occasions, geographic locations or other aspects of the target population.
CrossLoc3D: We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features.
– Overview Crowd and multi-agent simulation is the process of simulating large numbers of people, creatures, or other characters, each interacting in one environment. These actors are expected to move to their goals, interact with their environment, and respond to each other. Crowd simulations have many uses, including improving architectural planning, enhancing training environments and virtual realities, and driving artificially-intelligent (AI) characters in games and movies.
Most existing traffic video datasets e.g. the Waymo Open Motion Dataset, are collected in Western countries and consist of simple, structured, and predictable traffic. Most Asian scenarios, however, are far denser, more unstructured, and more heterogeneous, with many road-agents routinely disobeying common traffic rules. Consequently, state-of-the-art computer vision and autonomous driving perception algorithms trained on existing datasets do not transfer to traffic in Asian countries. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.
We present a differentiable frequency-based method for aerial video recognition. Our differentiable static-dynamic frequency mask provides a prior for disentangled regions relevant to action recognition. This mask is used to guide the learning of disentangled features within the layers of the neural network using an identity function. Further, we propose a frame sampling strategy that chooses the best frame within each uniform video segment, at test time, using the static-dynamic frequency mask and temporal difference.
Abstract We present an algorithm for safe robot navigation in complex dynamic environments using a variant of model predictive equilibrium point control. We use an optimization formulation to navigate robots gracefully in dynamic environments by optimizing over a trajectory cost function at each timestep. We present a novel trajectory cost formulation that significantly reduces conservative and deadlocking behaviors and generates smooth trajectories. In particular, we propose a new collision probability function that effectively captures the risk associated with a given configuration and the time to avoid collisions based on the velocity direction.
Abstract We present a novel Deep Reinforcement Learning (DRL) based policy to compute dynamically feasible and spatially aware velocities for a robot navigating among mobile obstacles. Our approach combines the benefits of the Dynamic Window Approach (DWA) in terms of satisfying the robot’s dynamics constraints with state-of-the-art DRL-based navigation methods that can handle moving obstacles and pedestrians well. Our formulation achieves these goals by embedding the environmental obstacles’ motions in a novel low-dimensional observation space.