AZTR: We present a new general learning approach, Prompt Learning for Action Recognition (PLAR), which leverages the strengths of prompt learning to guide the learning process. Our approach is designed to predict the action label by helping the models focus on the descriptions or instructions associated with actions in the input videos. Our formulation uses various prompts, including learnable prompts, auxiliary visual information, and large vision models to improve the recognition performance.
PMISampler: We present a new algorithm for selection of informative frames in video action recognition. Our approach is designed for aerial videos captured using a moving camera where human actors occupy a small spatial resolution of video frames. Our algorithm utilizes the motion bias within aerial videos, which enables the selection of motion-salient frames. We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames, by measuring the similarity of patches.
Abstract This projects investigates a planning system for autonomous driving among many pedestrians. A key ingredient of our approach is a motion prediction model for pedestrians and vehicles. It accounts for both a pedestrian’s global navigation intention and local interactions with the vehicle and other pedestrians. Unfortunately, the autonomous vehicle does not know the pedestrians’ intentions a priori and requires a planning algorithm that hedges against the uncertainty in pedestrian intentions.
Abstract We present a novel interactive approach, PedVR, to generate plausible behaviors for a large number of virtual humans, and to enable natural interaction between the real user and virtual agents. Our formulation is based on a coupled approach that combines a 2D multi-agent navigation algorithm with 3D human motion synthesis. The coupling can result in plausible movement of virtual agents and can generate gazing behaviors, which can considerably increase the believability.
Abstract We present a novel interactive approach, PedVR, to generate plausible behaviors for a large number of virtual humans, and to enable natural interaction between the real user and virtual agents. Our formulation is based on a coupled approach that combines a 2D multi-agent navigation algorithm with 3D human motion synthesis. The coupling can result in plausible movement of virtual agents and can generate gazing behaviors, which can considerably increase the believability.
Abstract We present a Pedestrian Dominance Model (PDM) to identify the dominance characteristics of pedestrians for robot navigation. Through a perception study on a simulated dataset of pedestrians, PDM models the perceived dominance levels of pedestrians with varying motion behaviors corresponding to trajectory, speed, and personal space. At runtime, we use PDM to identify the dominance levels of pedestrians to facilitate socially-aware navigation for the robots. PDM can predict dominance levels from trajectories with~ 85% accuracy.
Abstract We present a Pedestrian Dominance Model (PDM) to identify the dominance characteristics of pedestrians for robot navigation. Through a perception study on a simulated dataset of pedestrians, PDM models the perceived dominance levels of pedestrians with varying motion behaviors corresponding to trajectory, speed, and personal space. At runtime, we use PDM to identify the dominance levels of pedestrians to facilitate socially-aware navigation for the robots. PDM can predict dominance levels from trajectories with~ 85% accuracy.
Abstract We provide the first perceptual quantification of user’s sensitivity to radial optic flow artifacts and demonstrate a promising approach for masking this optic flow artifact via blink suppression. Near-eye HMDs allow users to feel immersed in virtual environments by providing visual cues, like motion parallax and stereoscopy, that mimic how we view the physical world. However, these systems exhibit a variety of perceptual artifacts that can limit their usability and the user’s sense of presence in VR.
Abstract We present a novel method for placing a 3D human animation into a 3D scene while maintaining any human-scene interactions in the animation. We use the notion of computing the most important meshes in the animation for the interaction with the scene, which we call “keyframes.” These keyframes allow us to better optimize the placement of the animation into the scene such that interactions in the animations (standing, laying, sitting, etc.