* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Tuesday, October 28, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    Free Live Entertainment – Fremont Street Experience

    Enjoy Free Live Entertainment on Fremont Street Tonight!

    What to Know About ‘Good Morning America’s 50th Anniversary Episode – Wyoming News Now

    Celebrate the Milestone: Everything You Need to Know About Good Morning America’s 50th Anniversary Episode

    Dylan Efron suffers brutal nose injury in ‘DWTS’ rehearsals – Yahoo

    Dylan Efron Endures Painful Nose Injury During ‘DWTS’ Rehearsals

    Person shot, injured in parking lot of adult entertainment club in Gresham – KPTV

    Person Shot and Injured in Gresham Adult Entertainment Club Parking Lot

    Meet Belynda From ‘Married at First Sight’ Season 19: Age, Job, Instagram and More – Yahoo

    Meet Belynda from ‘Married at First Sight’ Season 19: Age, Career, Instagram & More Revealed!

    General Hospital’s Rena Sofer Exits as Lois — But the Door Isn’t Closed – Yahoo

    General Hospital’s Rena Sofer Exits as Lois — But the Door Isn’t Closed – Yahoo

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

    CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

    Researchers Discover New Bacterium That Turns Food Waste Into Energy – Technology Networks

    Scientists Unveil Breakthrough Bacterium That Transforms Food Waste Into Clean Energy

    Jim Cramer on GSI Technology: “That Thing is a Rocket Ship” – Yahoo Finance

    Jim Cramer Labels GSI Technology a “Rocket Ship” Poised for Takeoff

    The Anti-Tech Backlash Is Going to Grow Stronger – Jacobin

    The Anti-Tech Backlash Is Gaining Unstoppable Momentum

    Comments to EU Regarding the Draft Revised Technology Transfer Block Exemption Regulation and Technology Transfer Guidelines – Information Technology and Innovation Foundation

    Have Your Say: Share Your Thoughts on the Draft Revised Technology Transfer Block Exemption Regulation and Guidelines

    Ghost Tapping is exploiting tap-to-pay technology in order to steal your money; what your need to know – ABC7 New York

    Ghost Tapping: How Thieves Are Using Tap-to-Pay Technology to Steal Your Money and What You Need to Know

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    Free Live Entertainment – Fremont Street Experience

    Enjoy Free Live Entertainment on Fremont Street Tonight!

    What to Know About ‘Good Morning America’s 50th Anniversary Episode – Wyoming News Now

    Celebrate the Milestone: Everything You Need to Know About Good Morning America’s 50th Anniversary Episode

    Dylan Efron suffers brutal nose injury in ‘DWTS’ rehearsals – Yahoo

    Dylan Efron Endures Painful Nose Injury During ‘DWTS’ Rehearsals

    Person shot, injured in parking lot of adult entertainment club in Gresham – KPTV

    Person Shot and Injured in Gresham Adult Entertainment Club Parking Lot

    Meet Belynda From ‘Married at First Sight’ Season 19: Age, Job, Instagram and More – Yahoo

    Meet Belynda from ‘Married at First Sight’ Season 19: Age, Career, Instagram & More Revealed!

    General Hospital’s Rena Sofer Exits as Lois — But the Door Isn’t Closed – Yahoo

    General Hospital’s Rena Sofer Exits as Lois — But the Door Isn’t Closed – Yahoo

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

    CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

    Researchers Discover New Bacterium That Turns Food Waste Into Energy – Technology Networks

    Scientists Unveil Breakthrough Bacterium That Transforms Food Waste Into Clean Energy

    Jim Cramer on GSI Technology: “That Thing is a Rocket Ship” – Yahoo Finance

    Jim Cramer Labels GSI Technology a “Rocket Ship” Poised for Takeoff

    The Anti-Tech Backlash Is Going to Grow Stronger – Jacobin

    The Anti-Tech Backlash Is Gaining Unstoppable Momentum

    Comments to EU Regarding the Draft Revised Technology Transfer Block Exemption Regulation and Technology Transfer Guidelines – Information Technology and Innovation Foundation

    Have Your Say: Share Your Thoughts on the Draft Revised Technology Transfer Block Exemption Regulation and Guidelines

    Ghost Tapping is exploiting tap-to-pay technology in order to steal your money; what your need to know – ABC7 New York

    Ghost Tapping: How Thieves Are Using Tap-to-Pay Technology to Steal Your Money and What You Need to Know

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

A visual guide to Vision Transformer – A scroll story

April 16, 2024
in Technology
Share on FacebookShare on Twitter

This is a visual guide to Vision Transformers (ViTs), a class of deep learning models that have achieved state-of-the-art performance on image classification tasks. Vision Transformers apply the transformer architecture, originally designed for natural language processing (NLP), to image data. This guide will walk you through the key components of Vision Transformers in a scroll story format, using visualizations and simple explanations to help you understand how these models work and how the flow of the data through the model looks like.

Like normal convolutional neural networks, vision transformers are trained in a supervised manner. This means that the model is trained on a dataset of images and their corresponding labels.

1) Focus on one data point

To get a better understanding of what happens inside a vision transformer lets focus on a single data point (batch size of 1). And lets ask the question: How is this data point prepared in order to be consumed by a transformer?

2) Forget the label for the moment

The label will become more relevant later. For now the only thing that we are left with is a single image.

3) Create patches of the image

To prepare the image for the use inside the transformer we divide the image into equally sized patches of size p x p.

4) Flatting of the images patches

The patches are now flattened into vectors of dimension p’=p²*c where p is the size of the patch and c is the number of channels.

5) Creating patch embeddings

These image patch vectors are now encoded using a linear transformation. The resulting Patch Embedding Vector has a fixed size d.

6) Embedding all patches

Now that we have embedded our image patches into vectors of fixed size, we are left with an array of size n x d where n is the the number of image patches and d is the size of the patch embedding

7) Appending a classification token

In order for us to effectively train our model we extend the array of patch embeddings by an additional vector called classification token (cls token). This vector is a learnable parameter of the network and is randomly initialized. Note: We only have one cls token and we append the same vector for all data points.

8) Add positional embedding Vectors

Currently our patch embeddings have no positional information associated with them. We remedy that by adding a learnable randomly initialized positional embedding vector to all our patch embeddings. We also add a such a positional embedding vector to our classification token.

9) Transformer Input

After the positional embedding vectors have been added we are left with an array of size (n+1) x d . This will be our input for the transformer which will be explained in greater detail in the next steps

10.1) Transformer: QKV Creation

Our transformer input patch embedding vectors are linearly embedded into multiple large vectors. These new vectors are than separated into three equal sized parts. The Q – Query Vector, the K – Key Vector and the V – Value Vector . We will have (n+1) of a all of those vectors.

10.2) Transformer: Attention Score Calculation

To calculate our attention scores A we will now multiply all of our query vectors Q with all of our key vectors K.

10.3)Transformer: Attention Score Matrix

Now that we have the attention score matrix A we apply a `softmax` function to every row such that every row sums up to 1.

10.4)Transformer: Aggregated Contextual Information Calculation

To calculate the aggregated contextual information for the first patch embedding vector. We focus on the first row of the attention matrix. And use the entires as weights for our Value Vectors V. The result is our aggregated contextual information vector for the first image patch embedding.

10.5)Transformer: Aggregated Contextual Information for every patch

Now we repeat this process for every row of our attention score matrix and the result will be N+1 aggregated contextual information vectors. One for every patch + one for the classification token. This steps concludes our first Attention Head.

10.6)Transformer: Multi-Head Attention

Now because we are dealing multi head attention we repeat the entire process from step 10.1 – 10-5 again with a different QKV mapping. For our explanatory setup we assume 2 Heads but typically a VIT has many more. In the end this results in multiple Aggregated contextual information vectors.

10.7)Transformer: Last Attention Layer Step

These heads are stacked together and are mapped to vectors of size d which was the same size as our patch embeddings had.

10.8)Transformer: Attention Layer Result

The previous step concluded the attention layer and we are left with the same amount of embeddings of exactly the same size as we used as input.

10.9)Transformer: Residual connections

Transformers make heavy use of residual connections which simply means adding the input of the previous layer to the output the current layer. This is also something that we will do now.

10.10)Transformer: Residual connection Result

The addition results in vectors of the same size.

10.11)Transformer: Feed Forward Network

Now these outputs are feed through a feed forward neural network with non linear activation functions

10.12)Transformer: Final Result

After the transformer step there is another residual connections which we will skip here for brevity. And so the last step concluded the transformer layer. In the end the transformer produced outputs of the same size as input.

11) Repeat Transformers

Repeat the entire transformer calculation Steps 10.1 – Steps 10.12 for the Transformer several times e.g. 6 times.

12) Identify Classification token output

Last step is to identify the classification token output. This vector will be used in the final step of our Vision Transformer journey.

13) Final Step: Predicting classification probabilities

In the final and last step we use this classification output token and another fully connected neural network to predict the classification probabilities of our input image.

We train the Vision Transformer using a standard cross-entropy loss function, which compares the predicted class probabilities with the true class labels. The model is trained using backpropagation and gradient descent, updating the model parameters to minimize the loss function.

In this visual guide, we have walked through the key components of Vision Transformers, from the data preparation to the training of the model. We hope this guide has helped you understand how Vision Transformers work and how they can be used to classify images.

I prepared this little Colab Notebook to help you understand the Vision Transformer even better. Please have look for the ‘Blogpost’ comment. The code was taken from @lucidrains great VIT Pytorch implementation be sure to checkout his work.

If you have any questions or feedback, please feel free to reach out to me. Thank you for reading!

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Hacker News – https://blog.mdturp.ch/posts/2024-04-05-visual_guide_to_vision_transformer.html

Tags: guidetechnologyVisual
Previous Post

Product-Market Fit Isn’t a Black Box – A New Framework to Help B2B Founders

Next Post

Mobile Ad Blocker Will No Longer Stop YouTube’s Ads

Shohei Ohtani Hits 2 Homers, Ties 119-Year-Old Record In Dodgers’ World Series Game 3 Win – FOX Sports

Shohei Ohtani Hits 2 Homers, Ties 119-Year-Old Record In Dodgers’ World Series Game 3 Win – FOX Sports

October 28, 2025
Trump’s ‘golden age’ economic message undercut by his desire for much lower interest rates – which typically signal a weak jobs market – The Conversation

Trump’s ‘Golden Age’ Economy Questioned as Calls for Lower Interest Rates Reveal Job Market Weakness

October 28, 2025
Free Live Entertainment – Fremont Street Experience

Enjoy Free Live Entertainment on Fremont Street Tonight!

October 28, 2025

SGMC Health Launches Lung Nodule Clinic to Advance Early Detection and Treatment of Lung Cancer – SGMC Health

October 28, 2025
Former presidential photographer Pete Souza on his favorite memories of the East Wing after its demolition – CNN

Former presidential photographer Pete Souza on his favorite memories of the East Wing after its demolition – CNN

October 28, 2025

Once tadpoles lose lungs, they never get them back – Cornell Chronicle

October 28, 2025
Texas A&M Department of Poultry Science hires Williams – WATTPoultry.com

Texas A&M’s Poultry Science Department Welcomes Expert Williams to the Team

October 28, 2025

Raffaele Colombo Highlights the Thriving Community and Cutting-Edge Science at the Incredible AACR-NCI-EORTC Meeting

October 28, 2025
The Donut Chain Known For Making Some Of The Most Unique Flavors In The Game – Yahoo

Discover the Donut Chain Revolutionizing Flavor with Its Uniquely Delicious Creations

October 28, 2025
CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

CPE Technology Berhad (KLSE:CPETECH) Has Affirmed Its Dividend Of MYR0.015 – Yahoo Finance

October 28, 2025

Categories

Archives

October 2025
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Sep    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (889)
  • Economy (912)
  • Entertainment (21,783)
  • General (17,850)
  • Health (9,953)
  • Lifestyle (924)
  • News (22,149)
  • People (912)
  • Politics (922)
  • Science (16,122)
  • Sports (21,411)
  • Technology (15,891)
  • World (895)

Recent News

Shohei Ohtani Hits 2 Homers, Ties 119-Year-Old Record In Dodgers’ World Series Game 3 Win – FOX Sports

Shohei Ohtani Hits 2 Homers, Ties 119-Year-Old Record In Dodgers’ World Series Game 3 Win – FOX Sports

October 28, 2025
Trump’s ‘golden age’ economic message undercut by his desire for much lower interest rates – which typically signal a weak jobs market – The Conversation

Trump’s ‘Golden Age’ Economy Questioned as Calls for Lower Interest Rates Reveal Job Market Weakness

October 28, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version