* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Saturday, September 13, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    Entertainment Community Fund Launches Program Supporting Entrepreneurs – Playbill

    Entertainment Community Fund Unveils Exciting New Program to Empower Entrepreneurs

    Behind the turntables: DJ Johnny Kage’s story of perseverance – yahoo.com

    Behind the Turntables: DJ Johnny Kage’s Inspiring Journey of Perseverance

    The other WWE star James Gunn wanted for Peacemaker instead of John Cena – yahoo.com

    The WWE Star James Gunn Originally Wanted for Peacemaker Instead of John Cena

    Quinta Brunson, John Stamos Join Entertainment and Technology Summit – Variety

    Quinta Brunson and John Stamos to Headline Thrilling Entertainment and Technology Summit

    ‘Breaking Bad’ star arrested for incident with neighbor. Here’s the latest – PennLive.com

    Breaking Bad’ Star Arrested Following Neighbor Dispute: Latest Updates

    Palmetto Sports & Entertainment to air Columbia Fireflies playoff games – WIS News 10

    Catch Every Thrilling Moment: Palmetto Sports & Entertainment to Broadcast Columbia Fireflies Playoff Games!

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Lincoln Trail College Receives $100,000 Grant from Marathon Petroleum Corporation for Technology Center – wwbl.com

    Lincoln Trail College Lands $100,000 Grant from Marathon Petroleum to Elevate Technology Center

    Aston Martin to integrate Pirelli’s cyber tyre technology in future models – Just Auto

    Aston Martin to Revolutionize Future Models with Pirelli’s Cutting-Edge Cyber Tyre Technology

    Figure Technology’s stock sizzles after IPO, as investors stay hungry for crypto deals – MarketWatch

    Figure Technology’s Stock Skyrockets After IPO Amid Surging Crypto Investor Excitement

    AI is the ‘most transformational technology’ in our lifetime, AMD CEO argues – Fox Business

    AMD CEO Declares AI the Most Transformative Technology of Our Era

    PAR Technology (PAR) Unveils AI-Powered Assistant Enhancing Restaurant Operations and Customer Engagement – simplywall.st

    PAR Technology Unveils AI-Powered Assistant to Revolutionize Restaurant Operations and Boost Customer Engagement

    Lincoln Laboratory technologies win seven R&D 100 Awards for 2025 – MIT News

    Lincoln Laboratory Technologies Secure Seven Prestigious R&D 100 Awards for 2025

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    Entertainment Community Fund Launches Program Supporting Entrepreneurs – Playbill

    Entertainment Community Fund Unveils Exciting New Program to Empower Entrepreneurs

    Behind the turntables: DJ Johnny Kage’s story of perseverance – yahoo.com

    Behind the Turntables: DJ Johnny Kage’s Inspiring Journey of Perseverance

    The other WWE star James Gunn wanted for Peacemaker instead of John Cena – yahoo.com

    The WWE Star James Gunn Originally Wanted for Peacemaker Instead of John Cena

    Quinta Brunson, John Stamos Join Entertainment and Technology Summit – Variety

    Quinta Brunson and John Stamos to Headline Thrilling Entertainment and Technology Summit

    ‘Breaking Bad’ star arrested for incident with neighbor. Here’s the latest – PennLive.com

    Breaking Bad’ Star Arrested Following Neighbor Dispute: Latest Updates

    Palmetto Sports & Entertainment to air Columbia Fireflies playoff games – WIS News 10

    Catch Every Thrilling Moment: Palmetto Sports & Entertainment to Broadcast Columbia Fireflies Playoff Games!

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Lincoln Trail College Receives $100,000 Grant from Marathon Petroleum Corporation for Technology Center – wwbl.com

    Lincoln Trail College Lands $100,000 Grant from Marathon Petroleum to Elevate Technology Center

    Aston Martin to integrate Pirelli’s cyber tyre technology in future models – Just Auto

    Aston Martin to Revolutionize Future Models with Pirelli’s Cutting-Edge Cyber Tyre Technology

    Figure Technology’s stock sizzles after IPO, as investors stay hungry for crypto deals – MarketWatch

    Figure Technology’s Stock Skyrockets After IPO Amid Surging Crypto Investor Excitement

    AI is the ‘most transformational technology’ in our lifetime, AMD CEO argues – Fox Business

    AMD CEO Declares AI the Most Transformative Technology of Our Era

    PAR Technology (PAR) Unveils AI-Powered Assistant Enhancing Restaurant Operations and Customer Engagement – simplywall.st

    PAR Technology Unveils AI-Powered Assistant to Revolutionize Restaurant Operations and Boost Customer Engagement

    Lincoln Laboratory technologies win seven R&D 100 Awards for 2025 – MIT News

    Lincoln Laboratory Technologies Secure Seven Prestigious R&D 100 Awards for 2025

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Science

Tiny Language Models Come of Age

October 6, 2023
in Science
Tiny Language Models Come of Age
Share on FacebookShare on Twitter

To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories.

Adam Nickel for Quanta Magazine

Introduction

Learning English is no easy task, as countless students well know. But when the student is a computer, one approach works surprisingly well: Simply feed mountains of text from the internet to a giant mathematical model called a neural network. That’s the operating principle behind generative language models like OpenAI’s ChatGPT, whose ability to converse coherently (if not always truthfully) on a wide range of topics has surprised researchers and the public over the past year.

But the approach has its drawbacks. For one thing, the “training” procedure required to transmute vast text archives into state-of-the-art language models is costly and time-intensive. For another, even the people who train large language models find it hard to understand their inner workings; that, in turn, makes it hard to predict the many ways they can fail.

Faced with these difficulties, some researchers have opted to train smaller models on smaller data sets and then study their behavior. “It’s like sequencing the Drosophila genome versus sequencing the human genome,” said Ellie Pavlick, a language model researcher at Brown University.

Now, in a paper recently posted to the scientific preprint server arxiv.org, a pair of Microsoft researchers have introduced a new method for training tiny language models: Raise them on a strict diet of children’s stories.

Machine learning researchers have embraced this lesson. GPT-3.5, the large language model that powers the ChatGPT interface, has nearly 200 billion parameters, and it was trained on a data set comprising hundreds of billions of words. (OpenAI hasn’t released the corresponding figures for its successor, GPT-4.) Training such large models typically requires at least 1,000 specialized processors called GPUs running in parallel for weeks at a time. Only a few companies can muster the requisite resources, let alone train and compare different models.

The two researchers showed that language models thousands of times smaller than today’s state-of-the-art systems rapidly learned to tell consistent and grammatical stories when trained in this way. Their results hint at new research directions that might be helpful for training larger models and understanding their behavior.

“I found this paper very informative,” said Chandra Bhagavatula, a language model researcher at the Allen Institute for Artificial Intelligence in Seattle. “The concept itself is super interesting.”

Once Upon a Time

The neural networks at the heart of language models are mathematical structures loosely inspired by the human brain. Each one contains many artificial neurons arranged in layers, with connections between neurons in adjacent layers. The neural network’s behavior is governed by the strength of these connections, called parameters. In a language model, the parameters control which words the model might spit out next, given an initial prompt and the words it has generated already.

A model only truly comes to life during training, when it repeatedly compares its own output to the text in its training data set and adjusts its parameters to increase the resemblance. An untrained network with random parameters is trivially easy to assemble from a few lines of code, but it will just produce gibberish. After training, it can often plausibly continue unfamiliar text. Larger models often undergo further fine-tuning that teaches them to answer questions and follow instructions, but the bulk of the training is mastering word prediction.

Success at word prediction requires a language model to master many different skills. For example, the rules of English grammar suggest that the next word after the word “going” is likely to be “to,” regardless of the subject of the text. In addition, a system needs factual knowledge to complete “the capital of France is,” and completing a passage containing the word “not” requires a rudimentary grasp of logic.

“Raw language is very complicated,” said Timothy Nguyen, a machine learning researcher at DeepMind. “In order for interesting linguistic capabilities to arise, people have resorted to ‘more data is better.’”

Ronen Eldan in a blue short-sleeved shirt against a blurry green background.

Ronen Eldan realized he could use children’s stories generated by large language models to rapidly train smaller ones.

Weizmann Institute of Science

Introduction

Ronen Eldan, a mathematician who joined Microsoft Research in 2022 to study generative language models, wanted to develop a cheaper and faster way to explore their abilities. The natural way to do that was by using a small data set, and that in turn meant he’d have to train models to specialize in a specific task, so they wouldn’t spread themselves too thin. Initially, he wanted to train models to solve a certain class of math problems, but one afternoon, after spending time with his 5-year-old daughter, he realized that children’s stories were a perfect fit.

“It literally came to me after I read her a story,” he said.

To generate coherent children’s stories, a language model would need to learn facts about the world, keep track of characters and events, and observe the rules of grammar — simpler versions of the challenges facing large models. But large models trained on massive data sets learn countless irrelevant details along with the rules that really matter. Eldan hoped the brevity and limited vocabulary of children’s stories might make learning more manageable for small models — making them both easier to train and easier to understand.

In the world of language models, though, “small” is relative: A data set a thousand times smaller than the one used to train GPT-3.5 would still need to contain millions of stories. “I don’t know how much money you want to spend, but I’m guessing you’re not going to hire professionals to write [a couple million] short stories,” Nguyen said.

It would take an extraordinarily prolific author to satisfy such voracious readers, but Eldan had a few candidates in mind. Who better to write for an audience of small language models than large ones?

Toy Stories

Eldan immediately set out to create a library of synthetic children’s stories generated by large language models. But he soon discovered that even state-of-the-art models aren’t naturally very creative. If you just tell GPT-4 to write stories appropriate for 4-year-olds, Eldan said, “about one-fifth of the stories will be about children going to the park being scared of the slides.” That’s apparently the quintessential preschool story, as far as the internet is concerned.

The solution was to add a bit of randomness into the prompt. First, Eldan used GPT-4 to generate a list of 1,500 nouns, verbs and adjectives that a 4-year-old might know — short enough that he could easily check it himself. Then he wrote a simple computer program that would repeatedly prompt GPT-3.5 or GPT-4 to generate an age-appropriate story that included three random words from the list, along with an additional randomly chosen detail like a happy ending or plot twist. The resulting stories, mercifully, were less focused on scary slides.

Eldan now had a procedure for churning out training data on demand, but he had no idea how many stories he’d need to train a functional model, or how big that model would need to be. That’s when he teamed up with Yuanzhi Li, a machine learning researcher at Microsoft and Carnegie Mellon University, to try different possibilities, taking advantage of the fact that small models could be trained very quickly. Step 1 was deciding how to evaluate their models.

Introduction

In language model research — as in every classroom — grading is a fraught topic. There’s no perfect rubric that encapsulates everything researchers want to know, and models that excel at some tasks often fail spectacularly at others. Over time, researchers have developed various standard benchmarks based on questions with unambiguous answers, which is a good approach if you’re trying to evaluate specific skills. But Eldan and Li were interested in something more nebulous: How big do language models really need to be if you simplify language as much as possible?

“In order to directly test if the model speaks English, I think the only thing you can do is let the model generate English in an open-ended way,” Eldan said.

There are only two ways to measure a model’s performance on such qualitative questions: Rely on human graders, or turn once again to GPT-4. The two researchers chose the latter route, effectively letting the big models both write the textbooks and grade the essays.

Bhagavatula said he would have liked to see how GPT-4’s evaluations compared to those of human reviewers — GPT-4 may be biased toward models that it helped train, and the opaqueness of language models makes it hard to quantify such biases. But he doesn’t think such subtleties would affect comparisons between different models trained on similar sets of synthetic stories — the main focus of Eldan and Li’s work.

Eldan and Li used a two-step procedure for evaluating each of their small models after training. First, they prompted the small model with the first half of a story distinct from those in the training data set so that it generated a new ending, repeating this process with 50 different test stories. Second, they instructed GPT-4 to grade each of the small model’s endings based on three categories — creativity, grammar and consistency with the beginning of the story. They then averaged the scores in each category, ending up with three final grades per model.

With this procedure in hand, Eldan and Li were finally ready to compare different models and find out which were the star students.

Test Results

After some preliminary exploration, the two researchers settled on a training data set containing roughly 2 million stories. They then used this data set, dubbed TinyStories, to train models ranging in size from 1 million to 30 million parameters, with varying numbers of layers. It was quick work: Using only four GPUs, the largest of these models took no more than a day to train.

The smallest models struggled. For example, one test story begins with a mean-looking man telling a girl he will take her cat. A million-parameter model got stuck in a loop with the girl repeatedly telling the man she wanted to be friends. But the larger ones — still thousands of times smaller than GPT-3.5 — performed surprisingly well. The 28-million-parameter version told a coherent story, though the ending was grim: “Katie started to cry, but the man didn’t care. He took the cat away and Katie never saw her cat again. The end.”

In addition to testing their own models, Eldan and Li presented the same challenge to OpenAI’s GPT-2, a 1.5-billion-parameter model released in 2019. It fared far worse — before the story’s abrupt ending, the man threatens to take the girl to court, jail, the hospital, the morgue and finally the crematorium.

Merrill Sherman/Quanta Magazine

Merrill Sherman/Quanta Magazine

Introduction

Nguyen said it’s exciting that such tiny models were so fluent, but perhaps not surprising that GPT-2 struggled with the task: It’s a larger model but far from the state of the art, and it was trained on a very different data set. “A toddler training only on toddler tasks, like playing with some toys, might do better than you or I,” he noted. “We didn’t specialize in this simple thing.”

Comparisons between different TinyStories models don’t suffer from the same confounding factors. Eldan and Li observed hints that networks with fewer layers but more neurons per layer were better at answering questions that required factual knowledge; conversely, networks with more layers and fewer neurons per layer were better at keeping track of characters and plot points from earlier in the story. Bhagavatula found this result especially intriguing. If it can be replicated in larger models, he said, “that would be a really cool result that could stem out of this work.”

Eldan and Li also studied how their small models’ abilities depended on the duration of the training period. In every case, models mastered grammar first and consistency later. To Eldan, this pattern illustrates how differences in reward structures lead to differences in language acquisition patterns between neural networks and children. For language models, which learn by predicting words, “the incentive on the words ‘I want to have’ is as big as it is on the words ‘ice cream,’” he said. Children, on the other hand, “don’t care about whether they say ‘I would like to have some ice cream’ or just ‘ice cream, ice cream, ice cream.’”

Quality Versus Quantity

Eldan and Li hope that the research will motivate other researchers to train different models on the TinyStories data set and compare their capabilities. But it’s often hard to predict which characteristics of small models will also appear in larger ones.

“Maybe mouse models of vision are really good proxies of human vision, but are mouse models of depression good models of human depression?” Pavlick said. “For every case it’s a little bit different.”

The success of the TinyStories models also suggests a broader lesson. The standard approach to compiling training data sets involves vacuuming up text from across the internet and then filtering out the garbage. Synthetic text generated by large models could offer an alternative way to assemble high-quality data sets that wouldn’t have to be so large.

“We have more and more evidence that this is very effective, not only in TinyStories-sized models but also in larger models,” Eldan said. That evidence comes from a pair of follow-up papers about billion-parameter models by Eldan, Li and other Microsoft researchers. In the first paper, they trained a model to learn the programming language Python using snippets of code generated by GPT-3.5 along with carefully curated code from the internet. In the second, they augmented the training data set with synthetic “textbooks,” covering a wide range of topics, to train a general-purpose language model. In their tests, both models compared favorably to larger models trained on larger data sets. But evaluating language models is always tricky, and the synthetic training data approach is still in its infancy — more independent tests are necessary.

As state-of-the-art language models grow ever larger, surprising findings from their tiny cousins are reminders that there’s still much we don’t understand about even the simplest models. Nguyen expects to see many more papers exploring the approach pioneered by TinyStories.

“The question is: Where and why does size matter?” he said. “There should be a science of that, and this paper is hopefully the beginning of a rich story.”

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Quanta Magazine – https://www.quantamagazine.org/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005/

Tags: languagemodelsscience
Previous Post

Tokyo bourse to remove over 400 firms from Topix index

Next Post

V Cumbre Mundial de Salud Mental: Director de la OPS urge a garantizar el acceso a servicios y atención, sin estigma ni discriminación

‘No team is perfect’: Scotland hunt for historic World Cup upset against England – The Guardian

Scotland Sets Sights on Historic World Cup Upset Against England: “No Team Is Perfect

September 13, 2025
What’s happening this week in economics? – Deloitte

What’s happening this week in economics? – Deloitte

September 13, 2025
VNC Recap: The Shifting Economics of University Sports & Entertainment, From $2.8B Settlement, NIL and Mixed-Use Venue Design – Pollstar News

The Future of University Sports and Entertainment: From a $2.8B Settlement to NIL and Cutting-Edge Venue Designs

September 13, 2025
Health costs associated with pregnancy, childbirth, and infant care – healthsystemtracker.org

Breaking Down the True Costs of Pregnancy, Childbirth, and Infant Care

September 13, 2025
Treasury Department says it will ‘fully cooperate’ with House Oversight panel’s Epstein probe – CNN

Treasury Department Pledges Full Cooperation in House Oversight’s Epstein Investigation

September 13, 2025
UW-Stevens Point hosts lecture on cannabis culture and research – Stevens Point Journal

UW-Stevens Point hosts lecture on cannabis culture and research – Stevens Point Journal

September 13, 2025
Southern Miss to Host 7th Annual Rayborn Lecture Featuring Renowned Physical Chemist – The University of Southern Mississippi

Southern Miss Welcomes Renowned Physical Chemist for 7th Annual Rayborn Lecture

September 13, 2025
Shreveport couple accused of defrauding Medicaid to fund cosmetic surgery, luxury lifestyle – WAFB

Shreveport Couple Accused of Using Medicaid Fraud to Fund Cosmetic Surgery and Extravagant Lifestyle

September 13, 2025
Lincoln Trail College Receives $100,000 Grant from Marathon Petroleum Corporation for Technology Center – wwbl.com

Lincoln Trail College Lands $100,000 Grant from Marathon Petroleum to Elevate Technology Center

September 13, 2025
Fall sports programs relish — or ignore — early effects of new roster limits – The Cavalier Daily

Fall Sports Programs Embrace or Overlook Early Impact of New Roster Limits

September 13, 2025

Categories

Archives

September 2025
MTWTFSS
1234567
891011121314
15161718192021
22232425262728
2930 
« Aug    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (818)
  • Economy (838)
  • Entertainment (21,715)
  • General (17,011)
  • Health (9,881)
  • Lifestyle (852)
  • News (22,149)
  • People (841)
  • Politics (846)
  • Science (16,047)
  • Sports (21,337)
  • Technology (15,819)
  • World (820)

Recent News

‘No team is perfect’: Scotland hunt for historic World Cup upset against England – The Guardian

Scotland Sets Sights on Historic World Cup Upset Against England: “No Team Is Perfect

September 13, 2025
What’s happening this week in economics? – Deloitte

What’s happening this week in economics? – Deloitte

September 13, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version