* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Tuesday, July 8, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

    Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

    Government whip to withdraw Entertainment Complex Bill on July 9 – Nation Thailand

    Government whip to withdraw Entertainment Complex Bill on July 9 – Nation Thailand

    Magicians and Battlebots light up Las Vegas entertainment scene – KSNV

    Magicians and Battlebots Take Las Vegas Entertainment by Storm

    Max-Matching Entertainments & Longhua District form partnership for new entertainment complex – Blooloop

    Max-Matching Entertainments and Longhua District Unite to Launch Thrilling New Entertainment Complex

    Kennedy Publishing, MGA Entertainment Launch Yummiland Magazine – License Global

    Kennedy Publishing, MGA Entertainment Launch Yummiland Magazine – License Global

    MAY HER SOUL REST IN PEACE 🙏 Veteran entertainment columnist and talent manager Lolit Solis has passed away. She was 78 years old. https://tinyurl.com/6kumarkx | LatestChika.com – Facebook

    Beloved Entertainment Icon Lolit Solis Passes Away at 78 – A Life Remembered with Love and Respect 🙏

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

    AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

    Technology And Construction Names Join Top Stock Lists: Check Out Additions To IBD 50, Big Cap 20 And More – Investor’s Business Daily

    Technology and Construction Leaders Surge Into Top Stock Rankings: See the Latest Additions to IBD 50, Big Cap 20, and More

    Column: Teach kupuna new technology skills – Honolulu Star-Advertiser

    Empowering Kupuna: Unlocking New Technology Skills for a Connected Future

    EIFO invests $5 million in D3, the Ukraine-focused defence technology venture fund – sUAS News

    EIFO Pledges $5 Million to Supercharge Ukraine-Focused Defense Technology Fund

    New Technology for Water Efficiency and Working with Mexico on Screwworm – AG INFORMATION NETWORK OF THE WEST

    Revolutionary Water Efficiency Technology and Cross-Border Collaboration to Defeat Screwworm

    Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

    Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

    Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

    Government whip to withdraw Entertainment Complex Bill on July 9 – Nation Thailand

    Government whip to withdraw Entertainment Complex Bill on July 9 – Nation Thailand

    Magicians and Battlebots light up Las Vegas entertainment scene – KSNV

    Magicians and Battlebots Take Las Vegas Entertainment by Storm

    Max-Matching Entertainments & Longhua District form partnership for new entertainment complex – Blooloop

    Max-Matching Entertainments and Longhua District Unite to Launch Thrilling New Entertainment Complex

    Kennedy Publishing, MGA Entertainment Launch Yummiland Magazine – License Global

    Kennedy Publishing, MGA Entertainment Launch Yummiland Magazine – License Global

    MAY HER SOUL REST IN PEACE 🙏 Veteran entertainment columnist and talent manager Lolit Solis has passed away. She was 78 years old. https://tinyurl.com/6kumarkx | LatestChika.com – Facebook

    Beloved Entertainment Icon Lolit Solis Passes Away at 78 – A Life Remembered with Love and Respect 🙏

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

    AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

    Technology And Construction Names Join Top Stock Lists: Check Out Additions To IBD 50, Big Cap 20 And More – Investor’s Business Daily

    Technology and Construction Leaders Surge Into Top Stock Rankings: See the Latest Additions to IBD 50, Big Cap 20, and More

    Column: Teach kupuna new technology skills – Honolulu Star-Advertiser

    Empowering Kupuna: Unlocking New Technology Skills for a Connected Future

    EIFO invests $5 million in D3, the Ukraine-focused defence technology venture fund – sUAS News

    EIFO Pledges $5 Million to Supercharge Ukraine-Focused Defense Technology Fund

    New Technology for Water Efficiency and Working with Mexico on Screwworm – AG INFORMATION NETWORK OF THE WEST

    Revolutionary Water Efficiency Technology and Cross-Border Collaboration to Defeat Screwworm

    Environmental cognitive distance, R&D capability distance, and supply chain green technology innovation – Nature

    Bridging Gaps: How Environmental and R&D Differences Drive Green Technology Innovation in Supply Chains

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

FlashAttention-3 unleashes the power of H100 GPUs for LLMs

July 16, 2024
in Technology
FlashAttention-3 unleashes the power of H100 GPUs for LLMs
Share on FacebookShare on Twitter

July 15, 2024 2:40 PM

lightning-fast GPU

Credit: VentureBeat with DALL-E 3

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Attention is a core component of the transformer architecture used in large language models (LLMs). But as LLMs grow larger and handle longer input sequences, the computational cost of attention becomes a bottleneck. 

To address this challenge, researchers from Colfax Research, Meta, Nvidia, Georgia Tech, Princeton University, and Together AI have introduced FlashAttention-3, a new technique that significantly speeds up attention computation on Nvidia Hopper GPUs (H100 and H800).

FlashAttention-3 builds upon previous work on FlashAttention and FlashAttention-2 and further optimizes the use of resources on Nvidia Hopper GPUs to maximize performance and efficiency for LLM training and inference. 

The challenge of attention computation in LLMs

One of the key innovations of transformers is the attention mechanism, which enables the model to compute the relationship between different tokens in an input sequence.

While the attention mechanism is very effective, it is also computationally expensive. The cost of attention computation grows quadratically with the length of the input sequence. As LLMs are scaled to handle longer and longer input sequences, the attention mechanism becomes a major bottleneck. 

Furthermore, modern hardware accelerators such as GPUs are optimized for matrix multiplication (matmul) operations, which are the building blocks of deep learning models. These accelerators also have computational units for other types of operations such as exponentiation, but those units are hundreds of times slower than the matmul components.

Attention computations use a combination of matrix multiplications and other special functions that are not as optimized for GPUs.

For example, the softmax function, which is used to normalize the attention weights, is computationally more expensive than matrix multiplication. As a result, even though matrix multiplications account for most of the computations in attention, the overall computation can get bogged down by a small number of special functions.

One of the important aspects of optimizing attention computation is to schedule the workloads in a way that operations do not get blocked by each other and make efficient use of different types of memory components. 

Making better use of hardware resources

FlashAttention, introduced in 2022, addressed the challenges of computing attention by reducing the number of memory reads and writes between GPU high bandwidth memory (HBM) and GPU on-chip static random access memory (SRAM) when doing attention computation. Instead of computing the attention weights for the entire sequence at once, FlashAttention breaks down the computation into smaller chunks, called “tiles,” that can be processed more efficiently on GPUs.

FlashAttention has been widely adopted and has contributed to increasing the context window of LLMs from a few thousand tokens to hundreds of thousands or even millions of tokens. 

However, as hardware has improved, so have the possibilities of optimizing LLM computations. FlashAttention-2, introduced in 2023, further optimized the use of GPU resources, achieving up to 70% of the declared maximum performance on Nvidia A100 GPUs. However, the same optimizations did not transfer to the newer H100 GPUs. FlashAttention-2 only used 35% of H100’s maximum capacity.

FlashAttention-3

FlashAttention-3 takes advantage of new features in Nvidia Hopper GPUs to maximize performance. These features enable higher throughput on matrix multiplication operations, faster data transfer across different memory segments, and better efficiency on low-precision operations.

FlashAttention-3 introduces several innovations to improve the performance of attention computation on H100 GPUs.

FlashAttention-3 schedules operations in a way that maximizes the overlap between computation and the movement of data between different memory segments of the GPU. This reduces the time the GPU spends idle waiting for data to be transferred. It also interleaves the matrix multiplication and softmax operations to reduce the possibility of bottlenecks in computing attention values.

FlashAttention-3 also uses a special arrangement of operations for faster and more accurate computations of attention in quantized models. Quantization is a popular technique that reduces the size of models by using low-bit numbers to store their weights. The tradeoff of quantization is the possible loss of accuracy. FlashAttention-3 addresses this problem by carefully arranging the computations to minimize the impact of quantization on accuracy.

According to the researchers, FlashAttention-3 achieves up to 75% usage of the H100 GPU’s maximum capabilities. This translates to a 1.5–2x speedup compared to previous versions of FlashAttention for both training and running LLMs.

The benefits of FlashAttention-3

The faster attention computation offered by FlashAttention-3 has several implications for LLM development and applications.

Training LLMs is a computationally expensive process that can take weeks or even months. The fast attention computation offered by FlashAttention-3 can significantly reduce the time it takes to train LLMs, which can enable researchers and developers to experiment with larger models and datasets.

FlashAttention-3 can also help extend the context window of LLMs by enabling them to process longer sequences more efficiently. This can unlock new applications for LLMs in areas such as long-form document understanding and many-shot in-context learning.

And by using a higher percentage of GPU capacity, FlashAttention-3 can reduce the number of accelerators required to run LLMs and slash the cost of running models in production.

The researchers have open-sourced FlashAttention-3 under a permissive license and plan to integrate it into popular deep learning libraries such as PyTorch and Hugging Face Transformers. This will make it easier for researchers and developers to take advantage of the performance benefits of FlashAttention-3.
“We have seen that designing algorithms that take advantage of the hardware they run on can bring significant efficiency gains and unlock new model capabilities such as long context,” the researchers wrote in a blog post published by Together AI. “We look forward to future work on optimization for LLM inference, as well as generalizing our techniques to other hardware architectures.”

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : VentureBeat – https://venturebeat.com/ai/flashattention-3-unleashes-the-power-of-h100-gpus-for-llms/

Tags: FlashAttention-technologyUnleashes
Previous Post

Call of Duty: Black Ops 6 multiplayer beta arrives August 30 — on all platforms

Next Post

RPG Cast – Episode 685: “You Erased My Save, Now You Have to Beat It”

AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

July 8, 2025
Brewers chase All-Star Yoshinobu Yamamoto with 5 runs in 1st inning, hand Dodgers their 4th straight loss – Yahoo Sports

Brewers chase All-Star Yoshinobu Yamamoto with 5 runs in 1st inning, hand Dodgers their 4th straight loss – Yahoo Sports

July 8, 2025
An episodic burst of massive genomic rearrangements and the origin of non-marine annelids – Nature

Explosive Genomic Shifts Ignite the Evolutionary Rise of Land-Dwelling Annelids

July 8, 2025
Earth is going to spin much faster over the next few months — so fast that several days are going to get shorter – Live Science

Earth is going to spin much faster over the next few months — so fast that several days are going to get shorter – Live Science

July 8, 2025
Putnam Museum and Science Center Ribbon-Cutting July 12 – River Cities’ Reader

Putnam Museum and Science Center Ribbon-Cutting July 12 – River Cities’ Reader

July 8, 2025
New Miiro Lifestyle Hotel Brand Expands Further In Europe – Forbes

New Miiro Lifestyle Hotel Brand Expands Further In Europe – Forbes

July 8, 2025
Dino Might!: ‘Jurassic World Rebirth’ Bows To $318.3M Global In Biggest Studio Opening Year-To-Date WW; ‘F1’ Nears $300M’ & ‘Dragon’ Tops $500M – International Box Office – Deadline

Dino Might!: ‘Jurassic World Rebirth’ Bows To $318.3M Global In Biggest Studio Opening Year-To-Date WW; ‘F1’ Nears $300M’ & ‘Dragon’ Tops $500M – International Box Office – Deadline

July 8, 2025
As a Nation’s Economy Slows, Some Say It’s No Time for a Free Lunch – The New York Times

As a Nation’s Economy Slows, Some Say It’s No Time for a Free Lunch – The New York Times

July 8, 2025
Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

Longhua District and Max-Matching Entertainments, supported by RWS Global forge strategic partnership to develop international IP-themed entertainment complex – Amusement Today

July 8, 2025
The 63 Best Amazon Prime Day Health & Fitness Deals, Up to 85% Off – health.com

The 63 Best Amazon Prime Day Health & Fitness Deals, Up to 85% Off – health.com

July 8, 2025

Categories

Archives

July 2025
MTWTFSS
 123456
78910111213
14151617181920
21222324252627
28293031 
« Jun    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (710)
  • Economy (735)
  • Entertainment (21,623)
  • General (15,773)
  • Health (9,773)
  • Lifestyle (740)
  • News (22,149)
  • People (735)
  • Politics (744)
  • Science (15,952)
  • Sports (21,234)
  • Technology (15,719)
  • World (716)

Recent News

AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

AI and the Trust Revolution: How Technology Is Transforming Human Connections – Foreign Affairs

July 8, 2025
Brewers chase All-Star Yoshinobu Yamamoto with 5 runs in 1st inning, hand Dodgers their 4th straight loss – Yahoo Sports

Brewers chase All-Star Yoshinobu Yamamoto with 5 runs in 1st inning, hand Dodgers their 4th straight loss – Yahoo Sports

July 8, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version