* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Thursday, November 6, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    Finding fun, entertainment or support in local VFW posts – The Avenue News

    Finding fun, entertainment or support in local VFW posts – The Avenue News

    Trixie Mattel to share journey in entertainment, advocacy at UW–Madison – WKOW

    Trixie Mattel to Share Her Inspiring Journey in Entertainment and Advocacy at UW-Madison

    Cleveland State to Broadcast Six Basketball Games on Rock Entertainment Sports Network – csuvikings.com

    Cleveland State to Broadcast Six Basketball Games on Rock Entertainment Sports Network – csuvikings.com

    Can Caesars Entertainment’s (CZR) Investment in Digital Offset Las Vegas Weakness? – simplywall.st

    How do you spell success? ‘Spelling Bee’ lands at Surfside Playhouse – Florida Today

    How Do You Spell Success? Catch ‘Spelling Bee’ Live at Surfside Playhouse!

    Belmont Names Debbie Carroll Head of New Center for Mental Health in Entertainment – Billboard

    Debbie Carroll Named Leader of Groundbreaking New Center for Mental Health in Entertainment

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    How We Lost Ourselves to Technology—and How We Can Come Back – The Free Press

    How Technology Took Over Our Lives-and How We Can Take Back Control

    Sleeper Picks: World Wide Technology Championship – PGA Tour

    Discover the Ultimate Sleeper Picks for the World Wide Technology Championship

    Rowland.ai Named Disruptive Technology of the Year by The Energy Council – GlobeNewswire

    Rowland.ai Named Disruptive Technology of the Year by Industry Leaders

    Peraton Honored As Silver Stevie® Award Winner in 2025 Stevie Awards for Technology Excellence – The AI Journal

    Peraton Honored As Silver Stevie® Award Winner in 2025 Stevie Awards for Technology Excellence – The AI Journal

    [News] China Makes Breakthrough in Chip Technology, Paving the Way for Lithography Advancements – TrendForce

    [News] China Makes Breakthrough in Chip Technology, Paving the Way for Lithography Advancements – TrendForce

    Can RFID technology solve the global medicine shortage crisis? – World Health Expo

    Can RFID technology solve the global medicine shortage crisis? – World Health Expo

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    Finding fun, entertainment or support in local VFW posts – The Avenue News

    Finding fun, entertainment or support in local VFW posts – The Avenue News

    Trixie Mattel to share journey in entertainment, advocacy at UW–Madison – WKOW

    Trixie Mattel to Share Her Inspiring Journey in Entertainment and Advocacy at UW-Madison

    Cleveland State to Broadcast Six Basketball Games on Rock Entertainment Sports Network – csuvikings.com

    Cleveland State to Broadcast Six Basketball Games on Rock Entertainment Sports Network – csuvikings.com

    Can Caesars Entertainment’s (CZR) Investment in Digital Offset Las Vegas Weakness? – simplywall.st

    How do you spell success? ‘Spelling Bee’ lands at Surfside Playhouse – Florida Today

    How Do You Spell Success? Catch ‘Spelling Bee’ Live at Surfside Playhouse!

    Belmont Names Debbie Carroll Head of New Center for Mental Health in Entertainment – Billboard

    Debbie Carroll Named Leader of Groundbreaking New Center for Mental Health in Entertainment

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    How We Lost Ourselves to Technology—and How We Can Come Back – The Free Press

    How Technology Took Over Our Lives-and How We Can Take Back Control

    Sleeper Picks: World Wide Technology Championship – PGA Tour

    Discover the Ultimate Sleeper Picks for the World Wide Technology Championship

    Rowland.ai Named Disruptive Technology of the Year by The Energy Council – GlobeNewswire

    Rowland.ai Named Disruptive Technology of the Year by Industry Leaders

    Peraton Honored As Silver Stevie® Award Winner in 2025 Stevie Awards for Technology Excellence – The AI Journal

    Peraton Honored As Silver Stevie® Award Winner in 2025 Stevie Awards for Technology Excellence – The AI Journal

    [News] China Makes Breakthrough in Chip Technology, Paving the Way for Lithography Advancements – TrendForce

    [News] China Makes Breakthrough in Chip Technology, Paving the Way for Lithography Advancements – TrendForce

    Can RFID technology solve the global medicine shortage crisis? – World Health Expo

    Can RFID technology solve the global medicine shortage crisis? – World Health Expo

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

Boffins find AI stumbles when quizzed on the tough stuff

October 30, 2023
in Technology
Boffins find AI stumbles when quizzed on the tough stuff
Share on FacebookShare on Twitter

AI models can manage well enough when prompted with text or images, and may even solve complex problems when not making terrible errors.

OpenAI, for example, has said that its GPT-4 model managed to score 700 out of 800 on the SAT math exam. Not all such claims have borne out, however: A paper released in June that said GPT-4 could get a computer science degree at MIT was subsequently withdrawn.

So to better assess how large language models – which interpret text input – and large multimodal models – which interpret text, images and perhaps other forms of input – actually handle problem solving, a group of ten researchers from the University of California, Los Angeles, the University of Washington, and Microsoft Research have devised a testing benchmark called MathVista that focuses on visually-oriented challenges.

“The ability of these foundation models to perform mathematical reasoning in visual contexts has not been systematically examined,” say the authors – Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao, in a preprint paper [PDF].

It is thus essential, they say, to develop a new benchmark to help the development of mathematical reasoning with a visual component and to evaluate how various models compare at reasoning tasks.

Being able to show that one’s AI model can correctly solve visual problems may prove helpful in determining whether it’s wise to, say, trust software to drive a car without stopping atop an accident victim.

MathVista incorporates 6,141 examples that were developed from 28 multimodal datasets and from 3 new datasets called IQTest, FunctionQA, and PaperQA. It covers various forms of reasoning (algebraic, arithmetic, geometric, logical, numeric, scientific, and statistical), with a focus on figure question answering, geometry problem solving, math word problems, textbook questions, and visual questions.

Screenshot of MathVista challenge question

Screenshot of MathVista challenge question – Click to enlarge

The researchers tested a dozen foundation models: three LLMs ChatGPT, GPT-4, and Claude-2), two proprietary LMMs (GPT4V and Bard), and seven open-source LMMs. They also considered human answers, provided via Amazon Mechanical Turkers with at least a high school degree, and random responses.

AWS CEO talks up AI to focus minds of Wall Street types

Clippy-like AI at forefront of Windows update previews

Bug bounty hunters load up to stalk AI and fancy bagging big bucks

How prompt injection attacks hijack today’s top-end AI – and it’s tough to fix

The good news for AI practitioners is that the LLMs and LMMs all did better than random chance, which isn’t all that surprising considering that many of the questions were multiple choice rather than yes or no.

In fact, the top performer, OpenAI’s GPT-4V, managed to surpass human performance in specific areas – questions involving algebraic reasoning and complex visual challenges involving tables and function plots.

We note that Microsoft, whose researchers contributed to this project, has a substantial stake in OpenAI.

The less good news is that even GPT-4V only managed to get 49.9 percent of the questions correct. That’s adequate if the goal is to best multimodal Bard, which managed an accuracy percentage of 34.8 percent.

But it’s still shy of the Amazon Mechanical Turk workers who were put to the test and managed a score of 60.3 percent. As the researchers observe in their paper, “a 10.4 percent gap in overall accuracy remains when compared to the human baseline, leaving plenty of room for model improvement.” ®

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : The Register – https://go.theregister.com/feed/www.theregister.com/2023/10/29/ai_math_quiz/

Tags: Boffinsstumblestechnology
Previous Post

Tenfold electric vehicles on 2030 roads could be a shock to the system

Next Post

Somehow, The Black Phone Will Ring Up a Sequel

World Appears on Track to Triple Renewable Capacity by 2030 – Yale E360

World Appears on Track to Triple Renewable Capacity by 2030 – Yale E360

November 6, 2025
Commentary: How bad is Trump’s economy? Election results say it’s very bad, and getting worse – Los Angeles Times

Just How Troubling Is Trump’s Economy? Election Results Show It’s Taking a Turn for the Worse

November 6, 2025
Finding fun, entertainment or support in local VFW posts – The Avenue News

Finding fun, entertainment or support in local VFW posts – The Avenue News

November 6, 2025
Full Reset: BIG EAST Champion Kevin Cary’s Mental Health Journey to Succuess – Seton Hall University Athletics

Full Reset: BIG EAST Champion Kevin Cary’s Mental Health Journey to Succuess – Seton Hall University Athletics

November 6, 2025
Fierce backlash within GOP after Tucker Carlson gives White nationalist Nick Fuentes a platform – CNN

GOP Erupts in Outrage After Tucker Carlson Gives White Nationalist Nick Fuentes a Platform

November 6, 2025
Recycling Reform Act – Washington State Department of Ecology (.gov)

Washington State Launches Ambitious Recycling Reform to Revolutionize Waste Management

November 6, 2025
Science of the Stench: Why CSU’s corpse flower smells so foul – The Rocky Mountain Collegian

The Science Behind CSU’s Corpse Flower: Unraveling the Mystery of Its Foul Smell

November 6, 2025
Astronomer reveals first look at Comet 3I/ATLAS as it reappears from behind the sun – Live Science

Astronomer Unveils Stunning First Glimpse of Comet 3I/ATLAS Emerging from Behind the Sun

November 6, 2025
TikTok of Chow Chow Puppy’s First 6 Months Is Melting Hearts – Yahoo

Irresistible Chow Chow Puppy’s First 6 Months Melt Hearts Worldwide

November 6, 2025
Why Does Doing Hard Things Outside Feel So Rewarding? Outdoor Adventures Change Our Brains. – Outside Magazine

How Conquering Outdoor Challenges Transforms Your Brain and Boosts Your Well-Being

November 6, 2025

Categories

Archives

November 2025
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
« Oct    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (905)
  • Economy (927)
  • Entertainment (21,799)
  • General (18,025)
  • Health (9,968)
  • Lifestyle (939)
  • News (22,149)
  • People (928)
  • Politics (938)
  • Science (16,138)
  • Sports (21,427)
  • Technology (15,906)
  • World (911)

Recent News

World Appears on Track to Triple Renewable Capacity by 2030 – Yale E360

World Appears on Track to Triple Renewable Capacity by 2030 – Yale E360

November 6, 2025
Commentary: How bad is Trump’s economy? Election results say it’s very bad, and getting worse – Los Angeles Times

Just How Troubling Is Trump’s Economy? Election Results Show It’s Taking a Turn for the Worse

November 6, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version