* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Friday, December 26, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

    City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

    The big business stories in Hollywood with entertainment reporter John Horn – NEPM

    Unveiling Hollywood’s Biggest Business Stories with Entertainment Reporter John Horn

    Bart Story Dies: Veteran Entertainment Research Executive Was 63 – Deadline

    Bart Story Dies: Veteran Entertainment Research Executive Was 63 – Deadline

    Las Vegas: Caesars Entertainment extending discounts into 2026 – CDC Gaming

    Las Vegas: Caesars Entertainment extending discounts into 2026 – CDC Gaming

    Ayushmann Khurrana Banks on Family Entertainment With Four-Film Slate Following ‘Thamma’ Success (EXCLUSIVE) – Variety

    Ayushmann Khurrana Banks on Family Entertainment With Four-Film Slate Following ‘Thamma’ Success (EXCLUSIVE) – Variety

    From The Pitt to Forever & Heated Rivalry , These Were The Best TV Shows Of 2025 – Refinery29

    From The Pitt to Forever & Heated Rivalry , These Were The Best TV Shows Of 2025 – Refinery29

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Micron Technology (MU) Stock News and Forecasts: Record Highs, HBM Demand, and Analyst Targets to Watch on Dec. 26, 2025 – ts2.tech

    Micron Technology Hits Record Highs: Unpacking the Surge in HBM Demand and Key Analyst Targets for December 26, 2025

    Mehai Technology Limited (540730)’s Trend in 2025 – Market Entry Points & Low Risk Trading Plans – Bollywood Helpline

    Mehai Technology Limited (540730) in 2025: Unlocking Key Market Entry Points and Low-Risk Trading Strategies

    [News] Japan Develops 10nm Nanoimprint Technology, with Potential to Tackle EUV Bottleneck – TrendForce

    Japan Unveils Revolutionary 10nm Nanoimprint Technology Set to Surpass EUV Constraints

    Rising technology use prompts digital detoxing efforts in Austin – Community Impact | News

    Austin Embraces a Growing Digital Detox Movement Amid Tech Surge

    Astrobotic Technology lands $17.5M in contracts to advance reusable rocket development – WPXI

    Astrobotic Technology Lands $17.5M to Drive Breakthroughs in Reusable Rocket Innovation

    State officials warn of technology threatening online victims with sophisticated scams – Kauai Now

    State Officials Sound the Alarm on Sophisticated Tech-Driven Online Scams Targeting Victims

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

    City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

    The big business stories in Hollywood with entertainment reporter John Horn – NEPM

    Unveiling Hollywood’s Biggest Business Stories with Entertainment Reporter John Horn

    Bart Story Dies: Veteran Entertainment Research Executive Was 63 – Deadline

    Bart Story Dies: Veteran Entertainment Research Executive Was 63 – Deadline

    Las Vegas: Caesars Entertainment extending discounts into 2026 – CDC Gaming

    Las Vegas: Caesars Entertainment extending discounts into 2026 – CDC Gaming

    Ayushmann Khurrana Banks on Family Entertainment With Four-Film Slate Following ‘Thamma’ Success (EXCLUSIVE) – Variety

    Ayushmann Khurrana Banks on Family Entertainment With Four-Film Slate Following ‘Thamma’ Success (EXCLUSIVE) – Variety

    From The Pitt to Forever & Heated Rivalry , These Were The Best TV Shows Of 2025 – Refinery29

    From The Pitt to Forever & Heated Rivalry , These Were The Best TV Shows Of 2025 – Refinery29

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Micron Technology (MU) Stock News and Forecasts: Record Highs, HBM Demand, and Analyst Targets to Watch on Dec. 26, 2025 – ts2.tech

    Micron Technology Hits Record Highs: Unpacking the Surge in HBM Demand and Key Analyst Targets for December 26, 2025

    Mehai Technology Limited (540730)’s Trend in 2025 – Market Entry Points & Low Risk Trading Plans – Bollywood Helpline

    Mehai Technology Limited (540730) in 2025: Unlocking Key Market Entry Points and Low-Risk Trading Strategies

    [News] Japan Develops 10nm Nanoimprint Technology, with Potential to Tackle EUV Bottleneck – TrendForce

    Japan Unveils Revolutionary 10nm Nanoimprint Technology Set to Surpass EUV Constraints

    Rising technology use prompts digital detoxing efforts in Austin – Community Impact | News

    Austin Embraces a Growing Digital Detox Movement Amid Tech Surge

    Astrobotic Technology lands $17.5M in contracts to advance reusable rocket development – WPXI

    Astrobotic Technology Lands $17.5M to Drive Breakthroughs in Reusable Rocket Innovation

    State officials warn of technology threatening online victims with sophisticated scams – Kauai Now

    State Officials Sound the Alarm on Sophisticated Tech-Driven Online Scams Targeting Victims

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

Google’s VLOGGER AI model can generate video avatars from images

March 24, 2024
in Technology
Google’s VLOGGER AI model can generate video avatars from images
Share on FacebookShare on Twitter

google-2024-vlogger-spalsh-image.png

VLOGGER can take a single photograph of someone and create clips in high-fidelity and varying lengths, with accurate facial expressions and body movements, down to a blink, exceeding previous kinds of “talking head” software.

Google

The artificial Intelligence (AI) community has gotten so good at producing fake moving pictures — take a look at OpenAI’s Sora, introduced last month, with its slick imaginary fly-throughs — that one has to ask an intellectual and practical question: what should we do with all these videos?

Also: OpenAI unveils text-to-video model and the results are astonishing. Take a look for yourself

This week, Google scholar Enric Corona and his colleagues answered: control them using our VLOGGER tool. VLOGGER can generate a high-resolution video of people talking based on a single photograph. More importantly, VLOGGER can animate the video according to a speech sample, meaning the technology can animate the videos as a controlled likeness of a person — an “avatar” of high fidelity.

This tool could enable all kinds of creations. On the simplest level, Corona’s team suggests VLOGGER could have a big impact on helpdesk avatars because more realistic-looking synthetic talking humans can “develop empathy.” They suggest the technology could “enable entirely new use cases, such as enhanced online communication, education, or personalized virtual assistants.”

VLOGGER could also conceivably lead to a new frontier in deepfakes, real-seeming likenesses that say and do things the actual person never actually did. Corona’s team intends to provide consideration of the societal implications of VLOGGER in supplementary supporting materials. However, that material is not available on the project’s GitHub page. ZDNET reached out to Corona to ask about the supporting materials but had not received a reply at publishing time.

Also: As AI agents spread, so do the risks, scholars say

As described in the formal paper, “VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis”, Corona’s team aims to move past the inaccuracies of the state of the art in avatars. “The creation of realistic videos of humans is still complex and ripe with artifacts,” Corona’s team wrote.

The team noted that existing video avatars often crop out the body and hands, showing just the face. VLOGGER can show whole torsos along with hand movements. Other tools usually have limited variations across facial expressions or poses, offering just rudimentary lip-syncing. VLOGGER can generate “high-resolution video of head and upper-body motion […] featuring considerably diverse facial expressions and gestures” and is “the first approach to generate talking and moving humans given speech inputs.”

As the research team explained, “it is precisely automation and behavioral realism that [are] what we aim for in this work: VLOGGER is a multi-modal interface to an embodied conversational agent, equipped with an audio and animated visual representation, featuring complex facial expressions and increasing level of body motion, designed to support natural conversations with a human user.”

google-2024-vlogger-example

Based on a single photograph, left, the VLOGGER software predicts the frames of video, right, that should accompany each moment of a sound file of someone speaking, using a process known as “diffusion”, and then generates those frames of video in high-definition quality. 

Google

VLOGGER brings together a few recent trends in deep learning.

Multi-modality converges the many modes AI tools can absorb and synthesize, including text and audio, and images and video. 

Large language models such as OpenAI’s GPT-4 make it possible to use natural language as the input to drive actions of various kinds, be it creating paragraphs of text, a song, or a picture.

Researchers have also found numerous ways to create lifelike images and videos in recent years by refining “diffusion.” The term comes from molecular physics and refers to how, as the temperature rises, particles of matter go from being highly concentrated in an area to being more spread out. By analogy, bits of digital information can be seen as “diffuse” the more incoherent they become with digital noise.

Also: Move over Gemini, open-source AI has video tricks of its own

AI diffusion introduces noise into an image and reconstructs the original image to train a neural network to find the rules by which it was constructed. Diffusion is the root of the impressive image-generation process in Stability AI’s Stable Diffusion and OpenAI’s DALL-E. It’s also how OpenAI creates slick videos in Sora.

For VLOGGER, Corona’s team trained a neural network to associate a speaker’s audio with individual frames of video of that speaker. The team combined a diffusion process of reconstructing the video frame from the audio using yet another recent innovation, the Transformer. 

The Transformer uses the attention method to predict video frames based on frames that have happened in the past, in conjunction with the audio. By predicting actions, the neural network learns to render accurate hand and body movements and facial expressions, frame by frame, in sync with the audio.

The final step is to use the predictions from that first neural network to subsequently power the generation of high-resolution frames of video using a second neural network that also employs diffusion. That second step is also a high-water mark in data. 

Also: Generative AI fails in this very common ability of human thought

To make the high-resolution images, Corona’s team compiled MENTOR, a dataset featuring 800,000 “identities” of videos of people speaking. MENTOR consists of 2,200 hours of video, which the team claims makes it “the largest dataset used to date in terms of identities and length” and is 10 times larger than prior comparable datasets.

The authors find they can enhance that process with a follow-on step called “fine-tuning.” By submitting a full-length video to VLOGGER, after it’s already been “pre-trained” on MENTOR, they can more realistically capture the idiosyncrasies of a person’s head movement, such as blinking: “By fine-tuning our diffusion model with more data, on a monocular video of a subject, VLOGGER can learn to capture the identity better, e.g. when the reference image displays the eyes as closed,” a process the team refers to as “personalization.”

google-2024-vlogger-architecture

VLOGGER’s neural net is a combination of two different neural nets. The first one uses “masked attention” via a Transformer to predict what poses should happen in a frame of video based on the sound coming from the recorded audio signal of the speaker. The second neural net uses diffusion to generate a consistent sequence of video frames using the clues of body motion and expression from the first neural net.

Google

The larger point of this approach — linking predictions in one neural network with high-res imagery, and what makes VLOGGER provocative — is that the program is not merely generating a video, such as the way Sora does. VLOGGER links that video to actions and expressions that can be controlled. Its lifelike videos can be manipulated as they unfold, like puppets.

Also: Nvidia CEO Jensen Huang unveils next-gen ‘Blackwell’ chip family at GTC

“Our objective is to bridge the gap between recent video synthesis efforts,” Corona’s team wrote, “which can generate dynamic videos with no control over identity or pose, and controllable image generation methods.”

Not only can VLOGGER be a voice-driven avatar, but it can also lead to editing functions, such as altering the mouth or eyes of a speaking subject. For example, a virtual person who blinks a lot in a video could be changed to blinking a little or not at all. A wide-mouthed manner of speaking could be narrowed to a more discrete motion of the lips.

google-2024-vlogger-edited-videos.png

Having achieved a way to control high-resolution video via voice cues, VLOGGER opens the way to manipulations, such as changing the lip movements of the speaker at each stretch of the video to be different from the original source video.

VLOGGER

Having achieved a new state of the art in simulating people, the question not addressed by Corona’s team is what the world should expect from any misuse of the technology. It’s easy to imagine likenesses of a political figure saying something absolutely catastrophic about, say, imminent nuclear war.

Presumably, the next stage in this avatar game will be neural networks that, like the ‘Voight-Kampff test’ in the movie Blade Runner, can help society detect which speakers are real and which are just deepfakes with remarkably lifelike manners. 

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : ZDNet – https://www.zdnet.com/article/googles-vlogger-ai-model-can-generate-video-avatars-from-images-what-could-go-wrong/#ftag=RSSbaffb68

Tags: Google’stechnologyVlogger
Previous Post

The 101+ best Amazon Big Spring Sale deals to shop on Day 4

Next Post

Samsung will give you a free 65-inch 4K TV right now, but Amazon’s offer is even better

Tokyo Lifestyle (NASDAQ:TKLF) Could Be Struggling To Allocate Capital – simplywall.st

Is Tokyo Lifestyle Facing Challenges in Capital Allocation?

December 26, 2025
Micron Technology (MU) Stock News and Forecasts: Record Highs, HBM Demand, and Analyst Targets to Watch on Dec. 26, 2025 – ts2.tech

Micron Technology Hits Record Highs: Unpacking the Surge in HBM Demand and Key Analyst Targets for December 26, 2025

December 26, 2025
The 25 best sports photos of 2025 – and the stories behind them – BBC

25 Unforgettable Sports Photos of 2025 and the Legendary Stories Behind Them

December 26, 2025
Predators, Mammoth each have NHL-high 7 prospects at 2026 World Junior Championship – NHL.com

Predators and Mammoth Dominate with NHL-High Seven Prospects at 2026 World Junior Championship

December 26, 2025
Santa The Economic Terrorist – The Daily Economy

Santa The Economic Terrorist – The Daily Economy

December 26, 2025
City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

City of Gautier signs off on entertainment contract extension for The Sound Amphitheater – WLOX

December 26, 2025
What the doctors ordered: John Muir Health spreads holiday cheer with party, toy drive – Local News Matters

John Muir Health Spreads Holiday Cheer with Festive Party and Toy Drive

December 26, 2025
Opinion | Identity Politics: My Professional Look-Alikes – The Wall Street Journal

When Your Professional Doppelgängers Shake Up Identity Politics

December 26, 2025
Cyclosa Menge, 1866 (Araneidae) Orb-Weavers Build Stabilimenta That Resemble Larger Spiders – Wiley Online Library

Cyclosa Menge Orb-Weavers Craft Web Decorations That Mimic Larger Spiders

December 26, 2025
What feels strange and scary today might be a foundation of society tomorrow. – Psychology Today

What Feels Strange and Scary Today Could Become Tomorrow’s New Normal

December 26, 2025

Categories

Archives

December 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  
« Nov    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (988)
  • Economy (1,007)
  • Entertainment (21,884)
  • General (18,961)
  • Health (10,047)
  • Lifestyle (1,020)
  • News (22,149)
  • People (1,013)
  • Politics (1,021)
  • Science (16,222)
  • Sports (21,508)
  • Technology (15,990)
  • World (996)

Recent News

Tokyo Lifestyle (NASDAQ:TKLF) Could Be Struggling To Allocate Capital – simplywall.st

Is Tokyo Lifestyle Facing Challenges in Capital Allocation?

December 26, 2025
Micron Technology (MU) Stock News and Forecasts: Record Highs, HBM Demand, and Analyst Targets to Watch on Dec. 26, 2025 – ts2.tech

Micron Technology Hits Record Highs: Unpacking the Surge in HBM Demand and Key Analyst Targets for December 26, 2025

December 26, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version