Media Coverage

GAMMA Lab & Apple Develop AMUSE to Advance Agentic Multimodal Reasoning

GAMMA Lab researchers collaborated with Apple Machine Learning Research to develop AMUSE (Audio-Visual Benchmark and Alignment framework for Agentic Multi-Speaker Understanding), a new benchmark designed to evaluate and improve multimodal AI systems operating in complex, real-world conversational settings. AMUSE focuses on agentic multi-speaker reasoning — requiring models to track who is speaking over time, ground dialogue in visual context, and generate coherent multimodal summaries. The benchmark reveals significant limitations in existing multimodal large language models when reasoning across audio, vision, and language simultaneously.

Media Coverage

GAMMA Lab & Apple Develop AMUSE to Advance Agentic Multimodal Reasoning

GAMMA Collaborates with NVIDIA on Music Flamingo, Adopted by Universal Music Group

GAMMA's joint work on "Sensible Agents" with Google Research

New Research Helps Robots Grasp Situational Context

Sanjoy Chowdhury’s Vision for Smarter, Multimodal AI

Joint work with NVIDIA on Audio Flamingo 3

Why 'Thinking More' Isn't Always Making Generative AI Smarter

Sreyan Ghosh received NVIDIA Fellowship

Kasun Weerakoon received UMD ECE Dissertation Award

Mohamed Elnoor recognized for Advancing Robotics and Building Community at UMD ECE Department