* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Friday, July 4, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    MAY HER SOUL REST IN PEACE 🙏 Veteran entertainment columnist and talent manager Lolit Solis has passed away. She was 78 years old. https://tinyurl.com/6kumarkx | LatestChika.com – Facebook

    Beloved Entertainment Icon Lolit Solis Passes Away at 78 – A Life Remembered with Love and Respect 🙏

    Neil Young Plays Rare Full-Band ‘Ambulance Blues’ With The Chrome Hearts – Yahoo

    Neil Young Stuns Fans with Rare Full-Band Performance of ‘Ambulance Blues’ Alongside The Chrome Hearts

    BTS Announce Their Big Return and Yes, They Already Have Some Major Plans in the Works – Yahoo

    BTS Announce Their Big Return and Yes, They Already Have Some Major Plans in the Works – Yahoo

    Nantucket Dance Festival opens July 8 – The Inquirer and Mirror

    Nantucket Dance Festival Launches with Thrilling Performances Beginning July 8

    A Secret Society, Ritualistic Killings, and a Century-Old Curse Netflix and YRF Entertainment’s ‘Mandala Murders’ Premieres July 25 – About Netflix

    A Secret Society, Ritualistic Killings, and a Century-Old Curse: Dive into the Chilling World of ‘Mandala Murders’ Premiering July 25

    Susquehanna Raises Penn Entertainment Inc. (PENN) Price Target. – Yahoo Finance

    Susquehanna Raises Price Target for Penn Entertainment Inc. (PENN)

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

    Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

    LG Innotek CEO Moon Hyuksoo: “Our Next-gen Substrate Technology Will Change the Industry Paradigm” – TechPowerUp

    LG Innotek CEO Moon Hyuksoo: “Our Next-Gen Substrate Technology Will Revolutionize the Industry” Revolutionizing the Future: LG Innotek’s CEO Unveils Game-Changing Next-Gen Substrate Technology

    Inspira Technologies Secures Landmark $22.5M Deal: Major Revenue Breakthrough After FDA Clearance – Stock Titan

    Inspira Technologies Secures Landmark $22.5M Deal: Major Revenue Breakthrough After FDA Clearance – Stock Titan

    Meiwu Technology Company Limited and Shenzhen Zhinuo – GlobeNewswire

    Meiwu Technology Company Limited and Shenzhen Zhinuo – GlobeNewswire

    Owls inspire new revolutionary noise reduction technology – KTEN

    Owls inspire new revolutionary noise reduction technology – KTEN

    New center coming to Mizzou will focus on energy research and technology – Columbia Missourian

    Mizzou Launches Innovative New Center Dedicated to Energy Research and Technology

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    MAY HER SOUL REST IN PEACE 🙏 Veteran entertainment columnist and talent manager Lolit Solis has passed away. She was 78 years old. https://tinyurl.com/6kumarkx | LatestChika.com – Facebook

    Beloved Entertainment Icon Lolit Solis Passes Away at 78 – A Life Remembered with Love and Respect 🙏

    Neil Young Plays Rare Full-Band ‘Ambulance Blues’ With The Chrome Hearts – Yahoo

    Neil Young Stuns Fans with Rare Full-Band Performance of ‘Ambulance Blues’ Alongside The Chrome Hearts

    BTS Announce Their Big Return and Yes, They Already Have Some Major Plans in the Works – Yahoo

    BTS Announce Their Big Return and Yes, They Already Have Some Major Plans in the Works – Yahoo

    Nantucket Dance Festival opens July 8 – The Inquirer and Mirror

    Nantucket Dance Festival Launches with Thrilling Performances Beginning July 8

    A Secret Society, Ritualistic Killings, and a Century-Old Curse Netflix and YRF Entertainment’s ‘Mandala Murders’ Premieres July 25 – About Netflix

    A Secret Society, Ritualistic Killings, and a Century-Old Curse: Dive into the Chilling World of ‘Mandala Murders’ Premiering July 25

    Susquehanna Raises Penn Entertainment Inc. (PENN) Price Target. – Yahoo Finance

    Susquehanna Raises Price Target for Penn Entertainment Inc. (PENN)

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

    Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

    LG Innotek CEO Moon Hyuksoo: “Our Next-gen Substrate Technology Will Change the Industry Paradigm” – TechPowerUp

    LG Innotek CEO Moon Hyuksoo: “Our Next-Gen Substrate Technology Will Revolutionize the Industry” Revolutionizing the Future: LG Innotek’s CEO Unveils Game-Changing Next-Gen Substrate Technology

    Inspira Technologies Secures Landmark $22.5M Deal: Major Revenue Breakthrough After FDA Clearance – Stock Titan

    Inspira Technologies Secures Landmark $22.5M Deal: Major Revenue Breakthrough After FDA Clearance – Stock Titan

    Meiwu Technology Company Limited and Shenzhen Zhinuo – GlobeNewswire

    Meiwu Technology Company Limited and Shenzhen Zhinuo – GlobeNewswire

    Owls inspire new revolutionary noise reduction technology – KTEN

    Owls inspire new revolutionary noise reduction technology – KTEN

    New center coming to Mizzou will focus on energy research and technology – Columbia Missourian

    Mizzou Launches Innovative New Center Dedicated to Energy Research and Technology

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

DeepMind’s PEER scales language models with millions of tiny experts

July 13, 2024
in Technology
DeepMind’s PEER scales language models with millions of tiny experts
Share on FacebookShare on Twitter

July 12, 2024 8:53 PM

mixture of millions of experts

Image credit: VentureBeat with DALL-E 3

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Mixture-of-Experts (MoE) has become a popular technique for scaling large language models (LLMs) without exploding computational costs. Instead of using the entire model capacity for every input, MoE architectures route the data to small but specialized “expert” modules. MoE enables LLMs to increase their parameter while keeping inference costs low. MoE is used in several popular LLMs, including Mixtral, DBRX, Grok and reportedly GPT-4. 

However, current MoE techniques have limitations that restrict them to a relatively small number of experts. In a new paper, Google DeepMind introduces Parameter Efficient Expert Retrieval (PEER), a novel architecture that can scale MoE models to millions of experts, further improving the performance-compute tradeoff of large language models.

The challenge of scaling LLMs

The past few years have shown that scaling language models by increasing their parameter count leads to improved performance and new capabilities. However, there is a limit to how much you can scale a model before running into computational and memory bottlenecks.

Every transformer block used in LLMs has attention layers and feedforward (FFW) layers. The attention layer computes the relations between the sequence of tokens fed to the transformer block. The feedforward network is responsible for storing the model’s knowledge. FFW layers account for two-thirds of the model’s parameters and are one of the bottlenecks of scaling transformers. In the classic transformer architecture, all the parameters of the FFW are used in inference, which makes their computational footprint directly proportional to their size.

MoE tries to address this challenge by replacing the FFW with sparsely activated expert modules instead of a single dense FFW layer. Each of the experts contains a fraction of the parameters of the full dense layer and specializes in certain areas. The MoE has a router that assigns each input to several experts who are likely to provide the most accurate answer. 

By increasing the number of experts, MoE can increase the capacity of the LLM without increasing the computational cost of running it. 

Finding the right level of MoE granularity

According to recent studies, the optimal number of experts for an MoE model is related to several factors, including the number of training tokens and the compute budget. When these variables are balanced, MoEs have consistently outperformed dense models for the same amount of compute resources.

Furthermore, researchers have found that increasing the “granularity” of an MoE model, which refers to the number of experts, can lead to performance gains, especially when accompanied by an increase in model size and training data.

High-granularity MoE can also enable models to learn new knowledge more efficiently. Some studies suggest that by adding new experts and regularizing them properly, MoE models can adapt to continuous data streams, which can help language models deal with continuously changing data in their deployment environments.

Current approaches to MoE are limited and unscalable. For example, they usually have fixed routers that are designed for a specific number of experts and need to be readjusted when new experts are added.

Parameter Efficient Expert Retrieval 

DeepMind’s Parameter Efficient Expert Retrieval (PEER) architecture addresses the challenges of scaling MoE to millions of experts. PEER replaces the fixed router with a learned index to efficiently route input data to a vast pool of experts. For each given input, PEER first uses a fast initial computation to create a shortlist of potential candidates before choosing and activating the top experts. This mechanism enables the MoE to handle a very large number of experts without slowing down.

Unlike previous MoE architectures, where experts were often as large as the FFW layers they replaced, PEER uses tiny experts with a single neuron in the hidden layer. This design enables the model to share hidden neurons among experts, improving knowledge transfer and parameter efficiency. To compensate for the small size of the experts, PEER uses a multi-head retrieval approach, similar to the multi-head attention mechanism used in transformer models.

PEER layer architecturePEER layer architecture (source: arxiv)

A PEER layer can be added to an existing transformer model or used to replace an FFW layer. PEER is also related to parameter-efficient fine-tuning (PEFT) techniques. In PEFT techniques, parameter efficiency refers to the number of parameters that are modified to fine-tune a model for a new task. In PEER, parameter efficiency reduces the number of active parameters in the MoE layer, which directly affects computation and activation memory consumption during pre-training and inference. 

According to the paper, PEER could potentially be adapted to select PEFT adapters at runtime, making it possible to dynamically add new knowledge and features to LLMs.

PEER might be used in DeepMind’s Gemini 1.5 models, which according to the Google blog uses “a new Mixture-of-Experts (MoE) architecture.”

PEER in action

The researchers evaluated the performance of PEER on different benchmarks, comparing it against transformer models with dense feedforward layers and other MoE architectures. Their experiments show that PEER models achieve a better performance-compute tradeoff, reaching lower perplexity scores with the same computational budget as their counterparts. 

The researchers also found that increasing the number of experts in a PEER model leads to further perplexity reduction. 

“This design demonstrates a superior compute-performance trade-off in our experiments, positioning it as a competitive alternative to dense FFW layers for scaling foundation models,” the researchers write.

The findings are interesting because they challenge the long-held belief that MoE models reach peak efficiency with a limited number of experts. PEER shows that by applying the right retrieval and routing mechanisms, it is possible to scale MoE to millions of experts. This approach can help further reduce the cost and complexity of training and serving very large language models.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : VentureBeat – https://venturebeat.com/ai/deepminds-peer-scales-language-models-with-millions-of-tiny-experts/

Tags: DeepMind’sScalestechnology
Previous Post

DocuSign and Elastic supercharge generative contract and search solutions

Next Post

Meta researchers distill System 2 thinking into LLMs, improving performance on complex reasoning

Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

July 4, 2025
Penn State Ranks Inside Top Five in EA Sports College Football 26, QB Drew Allar Makes Rating Jump – Yahoo Sports

Penn State Ranks Inside Top Five in EA Sports College Football 26, QB Drew Allar Makes Rating Jump – Yahoo Sports

July 4, 2025
Church adds Mass ‘for care of creation’ to missal, pope to celebrate – usccb

Pope Introduces New Mass Dedicated to Caring for Creation

July 4, 2025
How UMich computer science students are navigating a shifting job market – The Michigan Daily

How UMich computer science students are navigating a shifting job market – The Michigan Daily

July 4, 2025
Genoa Central Junior High Student Places in the 2025 Soybean Science Challenge – TXK Today

Genoa Central Junior High Student Excels in 2025 Soybean Science Challenge

July 4, 2025
Maison & Objet, the Paris-based home and lifestyle trade show, announces leadership change – FashionNetwork India

Maison & Objet Reveals Dynamic New Leadership to Transform the Future of Home and Lifestyle

July 4, 2025
World’s biggest climate fund ramps up investment plans – Reuters

World’s Largest Climate Fund Accelerates Ambitious Investment Plans

July 4, 2025
US economy ‘on wobbly footing’: Why Wall Street strategists are cautious about stock market’s recent records – Yahoo Finance

US Economy on Shaky Ground: Why Wall Street Strategists Are Cautious Despite Stock Market Records

July 4, 2025
MAY HER SOUL REST IN PEACE 🙏 Veteran entertainment columnist and talent manager Lolit Solis has passed away. She was 78 years old. https://tinyurl.com/6kumarkx | LatestChika.com – Facebook

Beloved Entertainment Icon Lolit Solis Passes Away at 78 – A Life Remembered with Love and Respect 🙏

July 4, 2025
Supreme Court declines to hear case challenging parental consent for abortion – CNN

Supreme Court declines to hear case challenging parental consent for abortion – CNN

July 4, 2025

Categories

Archives

July 2025
MTWTFSS
 123456
78910111213
14151617181920
21222324252627
28293031 
« Jun    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (704)
  • Economy (730)
  • Entertainment (21,618)
  • General (15,702)
  • Health (9,768)
  • Lifestyle (734)
  • News (22,149)
  • People (730)
  • Politics (737)
  • Science (15,947)
  • Sports (21,228)
  • Technology (15,714)
  • World (710)

Recent News

Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

July 4, 2025
Penn State Ranks Inside Top Five in EA Sports College Football 26, QB Drew Allar Makes Rating Jump – Yahoo Sports

Penn State Ranks Inside Top Five in EA Sports College Football 26, QB Drew Allar Makes Rating Jump – Yahoo Sports

July 4, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version