* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Sunday, May 11, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    Free Flowin’ Fest brings entertainment to Pascagoula’s Beach Park – WLOX

    Experience the Excitement: Free Flowin’ Fest Lights Up Pascagoula’s Beach Park!

    ‘Experimental entertainment venue’ sets sights on Austin area – MySA

    ‘Experimental entertainment venue’ sets sights on Austin area – MySA

    Taylor Swift’s team calls subpoena in Blake Lively-Justin Baldoni case ‘tabloid clickbait’ – Yahoo

    Taylor Swift’s Team Slams Subpoena in Blake Lively-Justin Baldoni Case as ‘Tabloid Clickbait

    The Weeknd made the apocalypse sexy at his 2025 tour launch in Arizona – Yahoo

    The Weeknd Turns Up the Heat at His 2025 Tour Launch in Arizona!

    Flutter Entertainment eyes U.S. prediction markets amid growing interest – Sports Business Journal

    Flutter Entertainment Sets Its Sights on U.S. Prediction Markets as Interest Soars

    SXSW Rom-Com ‘I Really Love My Husband’ Acquired for U.S. Release – Variety

    Heartfelt Romance: ‘I Really Love My Husband’ Set to Captivate U.S. Audiences!

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Officials announce massive project that could reshape electric vehicle technology: ‘This is exactly the type of investment that will help us grow the economy’ – Yahoo Finance

    Game-Changer Ahead: Major Investment Set to Transform Electric Vehicle Technology and Boost the Economy!

    Federal agents raid Dymeng Technology Solutions in St. Augustine – Action News Jax

    Federal Agents Storm Dymeng Technology Solutions in St. Augustine: What You Need to Know

    SoundHound’s Amelia 7.0 Platform Delivers Agentic AI With Category Leading Voice Technology – Business Wire

    Unleashing the Future: SoundHound’s Amelia 7.0 Revolutionizes Voice Technology with Agentic AI

    Comings and goings: MPT hires VP of technology, NPR announces changes to Business Desk – Current – For people in public media

    Exciting Leadership Changes: MPT Welcomes New VP of Technology and NPR Revamps Business Desk!

    Harnessing emerging technologies to power a small business – The Oaklandside

    Unlocking Success: How Emerging Technologies Can Transform Your Small Business

    Artificial intelligence (AI) – The Guardian

    Unlocking the Future: How Artificial Intelligence is Transforming Our World

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    Free Flowin’ Fest brings entertainment to Pascagoula’s Beach Park – WLOX

    Experience the Excitement: Free Flowin’ Fest Lights Up Pascagoula’s Beach Park!

    ‘Experimental entertainment venue’ sets sights on Austin area – MySA

    ‘Experimental entertainment venue’ sets sights on Austin area – MySA

    Taylor Swift’s team calls subpoena in Blake Lively-Justin Baldoni case ‘tabloid clickbait’ – Yahoo

    Taylor Swift’s Team Slams Subpoena in Blake Lively-Justin Baldoni Case as ‘Tabloid Clickbait

    The Weeknd made the apocalypse sexy at his 2025 tour launch in Arizona – Yahoo

    The Weeknd Turns Up the Heat at His 2025 Tour Launch in Arizona!

    Flutter Entertainment eyes U.S. prediction markets amid growing interest – Sports Business Journal

    Flutter Entertainment Sets Its Sights on U.S. Prediction Markets as Interest Soars

    SXSW Rom-Com ‘I Really Love My Husband’ Acquired for U.S. Release – Variety

    Heartfelt Romance: ‘I Really Love My Husband’ Set to Captivate U.S. Audiences!

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Officials announce massive project that could reshape electric vehicle technology: ‘This is exactly the type of investment that will help us grow the economy’ – Yahoo Finance

    Game-Changer Ahead: Major Investment Set to Transform Electric Vehicle Technology and Boost the Economy!

    Federal agents raid Dymeng Technology Solutions in St. Augustine – Action News Jax

    Federal Agents Storm Dymeng Technology Solutions in St. Augustine: What You Need to Know

    SoundHound’s Amelia 7.0 Platform Delivers Agentic AI With Category Leading Voice Technology – Business Wire

    Unleashing the Future: SoundHound’s Amelia 7.0 Revolutionizes Voice Technology with Agentic AI

    Comings and goings: MPT hires VP of technology, NPR announces changes to Business Desk – Current – For people in public media

    Exciting Leadership Changes: MPT Welcomes New VP of Technology and NPR Revamps Business Desk!

    Harnessing emerging technologies to power a small business – The Oaklandside

    Unlocking Success: How Emerging Technologies Can Transform Your Small Business

    Artificial intelligence (AI) – The Guardian

    Unlocking the Future: How Artificial Intelligence is Transforming Our World

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

Gemini’s data-analyzing abilities aren’t as good as Google claims

June 30, 2024
in Technology
Gemini’s data-analyzing abilities aren’t as good as Google claims
Share on FacebookShare on Twitter

One of the selling points of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the amount of data they can supposedly process and analyze. In press briefings and demos, Google has repeatedly claimed that the models can accomplish previously impossible tasks thanks to their “long context,” like summarizing multiple hundred-page documents or searching across scenes in film footage.

But new research suggests that the models aren’t, in fact, very good at those things.

Two separate studies investigated how well Google’s Gemini models and others make sense out of an enormous amount of data — think “War and Peace”-length works. Both find that Gemini 1.5 Pro and 1.5 Flash struggle to answer questions about large datasets correctly; in one series of document-based tests, the models gave the right answer only 40% 50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpinska, a postdoc at UMass Amherst and a co-author on one of the studies, told TechCrunch.

Gemini’s context window is lacking

A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, show or audio clip. And as context windows grow, so does the size of the documents being fit into them.

The newest versions of Gemini can take in upward of 2 million tokens as context. (“Tokens” are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”) That’s equivalent to roughly 1.4 million words, two hours of video or 22 hours of audio — the largest context of any commercially available model.

In a briefing earlier this year, Google showed several pre-recorded demos meant to illustrate the potential of Gemini’s long-context capabilities. One had Gemini 1.5 Pro search the transcript of the Apollo 11 moon landing telecast — around 402 pages — for quotes containing jokes, and then find a scene in the telecast that looked similar to a pencil sketch.

VP of research at Google DeepMind Oriol Vinyals, who led the briefing, described the model as “magical.”

“[1.5 Pro] performs these sorts of reasoning tasks across every single page, every single word,” he said.

That might have been an exaggeration.

In one of the aforementioned studies benchmarking these capabilities, Karpinska, along with researchers from the Allen Institute for AI and Princeton, asked the models to evaluate true/false statements about fiction books written in English. The researchers chose recent works so that the models couldn’t “cheat” by relying on foreknowledge, and they peppered the statements with references to specific details and plot points that’d be impossible to comprehend without reading the books in their entirety.

Given a statement like “By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — having ingested the relevant book — had to say whether the statement was true or false and explain their reasoning.

Image Credits: UMass Amherst

Tested on one book around 260,000 words (~520 pages) in length, the researchers found that 1.5 Pro answered the true/false statements correctly 46.7% of the time while Flash answered correctly only 20% of the time. That means a coin is significantly better at answering questions about the book than Google’s latest machine learning model. Averaging all the benchmark results, neither model managed to achieve higher than random chance in terms of question-answering accuracy.

“We’ve noticed that the models have more difficulty verifying claims that require considering larger portions of the book, or even the entire book, compared to claims that can be solved by retrieving sentence-level evidence,” Karpinska said. “Qualitatively, we also observed that the models struggle with verifying claims about implicit information that is clear to a human reader but not explicitly stated in the text.”

The second of the two studies, co-authored by researchers at UC Santa Barbara, tested the ability of Gemini 1.5 Flash (but not 1.5 Pro) to “reason over” videos — that is, search through and answer questions about the content in them.

The co-authors created a dataset of images (e.g., a photo of a birthday cake) paired with questions for the model to answer about the objects depicted in the images (e.g., “What cartoon character is on this cake?”). To evaluate the models, they picked one of the images at random and inserted “distractor” images before and after it to create slideshow-like footage.

Flash didn’t perform all that well. In a test that had the model transcribe six handwritten digits from a “slideshow” of 25 images, Flash got around 50% of the transcriptions right. The accuracy dropped to around 30% with eight digits.

“On real question-answering tasks over images, it appears to be particularly hard for all the models we tested,” Michael Saxon, a PhD student at UC Santa Barbara and one of the study’s co-authors, told TechCrunch. “That small amount of reasoning — recognizing that a number is in a frame and reading it — might be what is breaking the model.”

Google is overpromising with Gemini

Neither of the studies have been peer-reviewed, nor do they probe the releases of Gemini 1.5 Pro and 1.5 Flash with 2-million-token contexts. (Both tested the 1-million-token context releases.) And Flash isn’t meant to be as capable as Pro in terms of performance; Google advertises it as a low-cost alternative.

Nevertheless, both add fuel to the fire that Google’s been overpromising — and under-delivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google’s the only model provider that’s given context window top billing in its advertisements.

“There’s nothing wrong with the simple claim, ‘Our model can take X number of tokens’ based on the objective technical details,” Saxon said. “But the question is, what useful thing can you do with it?”

Generative AI broadly speaking is coming under increased scrutiny as businesses (and investors) grow frustrated with the technology’s limitations.

In a pair of recent surveys from Boston Consulting Group, about half of the respondents — all C-suite executives — said that they don’t expect generative AI to bring about substantial productivity gains and that they’re worried about the potential for mistakes and data compromises arising from generative AI-powered tools. PitchBook recently reported that, for two consecutive quarters, generative AI dealmaking at the earliest stages has declined, plummeting 76% from its Q3 2023 peak.

Faced with meeting-summarizing chatbots that conjure up fictional details about people and AI search platforms that basically amount to plagiarism generators, customers are on the hunt for promising differentiators. Google — which has raced, at times clumsily, to catch up to its generative AI rivals — was desperate to make Gemini’s context one of those differentiators.

But the bet was premature, it seems.

“We haven’t settled on a way to really show that ‘reasoning’ or ‘understanding’ over long documents is taking place, and basically every group releasing these models is cobbling together their own ad hoc evals to make these claims,” Karpinska said. “Without the knowledge of how long context processing is implemented — and companies do not share these details — it is hard to say how realistic these claims are.”

Google didn’t respond to a request for comment.

Both Saxon and Karpinska believe the antidotes to hyped-up claims around generative AI are better benchmarks and, along the same vein, greater emphasis on third-party critique. Saxon notes that one of the more common tests for long context (liberally cited by Google in its marketing materials), “needle in the haystack,” only measures a model’s ability to retrieve particular info, like names and numbers, from datasets — not answer complex questions about that info.

“All scientists and most engineers using these models are essentially in agreement that our existing benchmark culture is broken,” Saxon said, “so it’s important that the public understands to take these giant reports containing numbers like ‘general intelligence across benchmarks’ with a massive grain of salt.”

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : TechCrunch – https://techcrunch.com/2024/06/29/geminis-data-analyzing-abilities-arent-as-good-as-google-claims/

Tags: data-analyzingGemini’stechnology
Previous Post

The biggest data breaches in 2024: 1B stolen records and rising

Next Post

Former IT employee accessed data of over 1 million US patients

Two decades of bacterial ecology and evolution in a freshwater lake – Nature

Two decades of bacterial ecology and evolution in a freshwater lake – Nature

May 11, 2025
NIH guts its first and largest study centered on women – Science | AAAS

Groundbreaking Women’s Health Study Faces Major Cuts: What It Means for the Future

May 11, 2025
Eggs are less likely to crack when dropped on their side, according to science – NBC News

Science Reveals: Dropping Eggs on Their Side Reduces Cracking Risk!

May 11, 2025
A letter to Mom: I am more like you than you think – Lifestyle.INQ

A letter to Mom: I am more like you than you think – Lifestyle.INQ

May 11, 2025
Zara: Inside the secretive world of the fashion brand – BBC

Unveiling Zara: A Deep Dive into the Enigmatic Fashion Empire

May 11, 2025
Trump’s team is finally meeting with China. The future of the global economy is riding on its success – CNN

Trump’s Team Engages with China: A Pivotal Moment for the Global Economy

May 11, 2025
Free Flowin’ Fest brings entertainment to Pascagoula’s Beach Park – WLOX

Experience the Excitement: Free Flowin’ Fest Lights Up Pascagoula’s Beach Park!

May 11, 2025
Local health system eliminates pay differential for nurses during National Nurses Week – NBC 5 Chicago

Local Health System Celebrates National Nurses Week by Equalizing Pay for Nurses!

May 11, 2025
Lehigh County pension fund halts buying Tesla stock because of performance, politics – LehighValleyNews.com

Lehigh County Pension Fund Stops Investing in Tesla: A Shift Driven by Performance and Politics

May 11, 2025
Officials announce massive project that could reshape electric vehicle technology: ‘This is exactly the type of investment that will help us grow the economy’ – Yahoo Finance

Game-Changer Ahead: Major Investment Set to Transform Electric Vehicle Technology and Boost the Economy!

May 11, 2025

Categories

Archives

May 2025
MTWTFSS
 1234
567891011
12131415161718
19202122232425
262728293031 
« Apr    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (600)
  • Economy (612)
  • Entertainment (21,525)
  • General (15,211)
  • Health (9,654)
  • Lifestyle (617)
  • News (22,149)
  • People (615)
  • Politics (619)
  • Science (15,834)
  • Sports (21,122)
  • Technology (15,602)
  • World (602)

Recent News

Two decades of bacterial ecology and evolution in a freshwater lake – Nature

Two decades of bacterial ecology and evolution in a freshwater lake – Nature

May 11, 2025
NIH guts its first and largest study centered on women – Science | AAAS

Groundbreaking Women’s Health Study Faces Major Cuts: What It Means for the Future

May 11, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version