ByteDance Launches OmniHuman-1: AI Tool Creates Lifelike Videos from Photos
TikTok’s parent company ByteDance has introduced OmniHuman-1, a cutting-edge AI tool that can generate highly realistic videos from a single photo. The system animates individuals, allowing them to appear as though they are talking, gesturing, singing, or playing musical instruments. According to a research paper published on the open-access archive arXiv, the technology delivers high-quality results regardless of the input image’s aspect ratio.
“OmniHuman significantly outperforms existing methods, generating extremely realistic human videos based on weak signal inputs, especially audio,” the research paper states. Researchers have shared sample videos on the project page, showcasing animated historical figures and celebrities, including a demonstration of Albert Einstein delivering a speech with realistic gestures and facial expressions.
Technological Potential and Creative Applications
Following viewing OmniHuman’s demo videos, Freddy Tran Nager, a communications professor at USC’s Annenberg School, was impressed. “If you were thinking of reviving Humphrey Bogart and casting him in a film, I’m not sure how that would look. But on a small screen, especially on a phone, these are impressive,” Nager commented. He believes OmniHuman could have numerous educational and creative uses. For instance, teachers and students might use it to bring historical figures to life in lessons. He humorously noted, “I would like Marilyn Monroe to teach me statistics.”
Nager also predicted that creators on TikTok might use avatars created by OmniHuman as a way of reducing exhaustion. He also added that ByteDance could directly post content by generating avatars for the site instead of the actual people, which would eliminate the need for real talent.
Ethical Risks and Concerns
Samantha G. Wolfe, an adjunct professor at New York University and the founder of PitchFWD, had positive feelings toward OmniHuman but at the same time was cautious about the prospect of using such a tool. “Creating something from just a picture and making it look like it’s talking and moving is fascinating from a technological standpoint, but it could have a lot of potential negative consequences, too,” she said. According to Wolfe, the use of technology comes with its pitfalls which include the production of fake information such as impersonation of political or business leaders’ videos which are potentially mosleading to the public.
As AI tools become more advanced, Wolfe emphasized that the risk of misinformation increases. “When it starts to look more and more like reality, more and more like humans doing it, the likelihood of people believing it becomes so much greater,” she noted.
Extensive Training and Privacy Implications
The ByteDance team trained OmniHuman on over 18,700 hours of human video, integrating the use of multiple input sources, namely audio, text, and physical poses. Nevertheless, ByteDance has not given any additional information regarding the training data it employs. Nager noted that people using TikTok may unknowingly have contributed to the dataset involuntarily. “If you created a TikTok video, there’s a good chance you’re now in a database that’s going to be used to create virtual humans,” he remarked.
OmniHuman places ByteDance in the league of other companies vying to rev up development in the area of realistic AV-generated videos. Moreover, it is sparking excitement as well as apprehensions about the prospects of realistic-looking videos in the digital space.