Gallery assistants hold an artwork by Spanish artist Pablo Picasso entitled ‘Femme au beret et a la … [+] robe quadrillee’ (Marie-Therese Walter) with an estimate price in the region of 35 million pounds, (50 million dollars), during a photocall at Sotheby’s in central London on February 22, 2018. – RESTRICTED TO EDITORIAL USE – MANDATORY MENTION OF THE ARTIST UPON PUBLICATION – TO ILLUSTRATE THE EVENT AS SPECIFIED IN THE CAPTION (Photo by Daniel LEAL / AFP) / RESTRICTED TO EDITORIAL USE – MANDATORY MENTION OF THE ARTIST UPON PUBLICATION – TO ILLUSTRATE THE EVENT AS SPECIFIED IN THE CAPTION / RESTRICTED TO EDITORIAL USE – MANDATORY MENTION OF THE ARTIST UPON PUBLICATION – TO ILLUSTRATE THE EVENT AS SPECIFIED IN THE CAPTION (Photo by DANIEL LEAL/AFP via Getty Images)
AFP via Getty Images
As AI increasingly dominates the narrative in technology and business, most people’s understanding of it remains limited to tools like ChatGPT. However, one rapidly advancing area is AI image generation. You may be familiar with some tools in this space, but I aim to examine how different image generation models respond to the same prompt.
First, let’s briefly explore how AI image generation works and the mechanical differences between AI text and image generation.
How do image generation models work? Models like DALL-E are trained using vast datasets of images and, in some cases, accompanying text descriptions. During training, the AI is fed millions of image-text pairs, learning associations between words and visual concepts. When given a text prompt, the model generates a corresponding image by synthesizing pixels in alignment with the patterns and visual relationships from its training data. Essentially, the AI acts like a painter, creating ‘brush strokes’ based on its database of image-text pairs. This process can lead to bias, which we will explore further in this article.
How do text generation models work? In contrast, text-based AI models, such as GPT-4, are trained on extensive text data, learning language patterns, grammar, and context. When prompted, they generate text by predicting the most likely next word or phrase based on the input and their training, essentially ‘guessing’ the best next words based on your input.
The key difference between image and text generation is that AI must interpret your words and visualize the concept you present.
Testing Image Generation with the Same Prompt
One pitfall of image generation is that limited training data can lead to divergent or biased outputs. As a Bay Area-based contributor, I tested the same prompt across four different image generators: “An image of 4 friends drinking wine in Napa, CA on a sunny day.”
For this test, I used:
Dall-E
Firefly
Midjourney
Imagen
I restricted the test to the ‘first image’ output from each model, as those familiar with these tools know they generate multiple images per prompt. For Dall-E and Imagen, I accessed the images through Canva, which has separate apps for both. Here were the results:
Dall-E Output
Dall-E Output for An image of 4 friends drinking wine in Napa, CA on a sunny day
Dall-E
Firefly Output
Firefly Output for An image of 4 friends drinking wine in Napa, CA on a sunny day.
Adobe Firefly
Midjourney Output
Midjourney output for An image of 4 friends drinking wine in Napa, CA on a sunny day.
Midjourney
Imagen Output
Imagen output for An image of 4 friends drinking wine in Napa, CA on a sunny day
Imagen
The outputs tended to converge on similar imagery. Notably, Midjourney showed the most divergence among the four results, followed by Firefly. The outputs from Dall-E and Imagen were relatively similar based on anecdotal observations.
While image generation technology is advancing rapidly, it raises concerns about bias and other potential issues. As training data expands, these models will improve. However, with video generation nearing mainstream adoption through companies like Runway and Pika, extra caution is necessary when relying on text-to-image and text-to-video outputs to avoid reinforcing societal biases.
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Forbes – https://www.forbes.com/sites/sunilrajaraman/2023/12/29/exploring-ai-images-using-the-same-prompt-with-different-models/