Meta-AI

Sound in Virtual Worlds: 3D Spatial Audio Breakthrough

The continuous development of intelligent systems mimicking and understanding human behavior has resulted in notable advancements in Computer Vision and Artificial Intelligence (AI). Despite the considerable focus on 3D human body modeling in computer vision, the task of modeling the acoustic aspect and producing 3D spatial audio from speech and body motion remains an open question.

Researchers from Shanghai AI Laboratory and Meta Reality Labs Research have unveiled a technique that addresses the challenge of modeling the acoustic side of human behavior. They have introduced a model that accurately produces 3D spatial audio representations for entire human bodies.

Unlike previous focuses on visual fidelity, this method uses head-mounted microphones and body pose data to synthesize precise 3D spatial sound. This is well-suited for situations where individuals are engaged in telepresence experiences using augmented reality and virtual reality technologies.

The team's approach, which trains a multi-modal network to distinguish between various noise sources and generate accurately spatialized signals, overcomes limitations of existing sound spatialization methods.

The empirical evaluation demonstrates the model's reliability in capturing sound fields resulting from body movements when trained with an appropriate loss function. The researchers have shared their code and dataset for public use.

Key contributions include the introduction of a unique technique for rendering realistic 3D sound fields, an empirical evaluation emphasizing the importance of body pose and a well-designed loss function, and the release of a new dataset combining multi-view human body data with spatial audio recordings from a 345-microphone array.