Art vs. technology: How does an AI art generator work?

 
Robot painting an other robot on a canvas

“Humanoid robot in an artist's atelier painting on a canvas” Image made with Dream AI. Prompt by Tatiana Botero

A few months ago, the world of digital art turned upside down with the news that Jason M. Allen won a digital art contest with his work "Théâtre D'opéra Spatial." This image was generated by Midjourney, an artificial intelligence currently ranked as one of the most powerful AI art generators from text.

Although Allen alleged that he had made multiple retouches to his piece and had spent a lot of time thinking, writing, and rewriting the prompt to compose the drawing, the criticism of many digital artists was immediate: "The piece is not Allen's, it's from an image generator AI," and the big question was: What does this art-generating AI mean for the future of digital art?

Although the question is interesting, and everyone who works with technology should really consider the social shifts their creations mean for people, that is not the interest of our article. Instead, our objective today is to answer the question: How does this image generator AI work? What are their advantages and disadvantages? How can they affect the human-technology relationship? And, How far can these "creative" AIs go?

Likewise, to test these AI, we asked several design and marketing team members to use different AI platforms to create the images that will illustrate this article, not only to demonstrate the capabilities of these AIs but to understand the fascination we feel for creating machines that increasingly blur the line between programming and creativity.

Artistic image generators AIs?

Although for many, this seems like a new technology, which exploded just over a few months ago, in reality, it has been a process of years. 

In 2018, the first AIs with the ability to recognize objects in images and convert them into descriptor texts similar to the alt text used in SEO was created. These AIs could recognize places, objects, and people by scanning images. An example of this was when Facebook started recognizing your friends' faces and tagging them in your photos without you having to do anything.

When perfecting this technology, the question was: can you do the opposite? Shortly after, the creators of Dall-E, one of the first AI generators of images from text, began to achieve some results. The idea was to create something new by scanning many images, seeking to create things that did not exist, such as flying elephants or a green school bus. Unfortunately, these first images were VERY small and had a lousy resolution. Still, they achieved the basics: create a totally new image from learning patterns found in the use of image banks so that the AI ​​​​already knew what made a bus a bus and could replicate it. This was perfected to such an extent that in 2021 the first version of Dall-E was released. 

Shortly after, other similar applications began to appear, many open source, which resulted in the refinement and expansion of many similar projects with different levels of training and censorship.

The big question is, how do these image generator AIs work?

Many people believe that these platforms only make a montage of existing images; for example: if I ask for a “cup of coffee in a snow globe”, the AI ​​will only cut a cup of coffee from one image, a snow globe from another, and mount them on a third image. It's not that simple. The image creation process is mediated by very complex learning and information processing processes in which we have several steps before generating an image.

The training begins with processing millions of images that work as data, but these images are not just images that are recorded and used as a reference; that's when it starts to get complex. These AIs usually use Deep Learning to be able to analyze the images, which implies separating them into different patterns of objects, colors, techniques, dimensions, etc. Some of these models generate more than five hundred classifications for each image. This separation is known as Latent Space, and they are the patterns that the AI ​​will use to put things together in your new image. To explain it, the AI ​​has seen thousands of different coffee cups and found the common patterns no matter the angle or technique, so now it can create any cup of coffee because it already knows what it looks like. This is the generation stage, a noise stage in which the AI ​​goes through all its information about coffee cups and snow globes to finally deliver an original and unrepeatable image of what the user has written.

Deep Learning is a subset of Machine Learning in which a large amount of data is needed to train the platform. The machines absorb the information, divide it and reorder it looking for a different and/or more efficient solution to a problem. In the case of image-generating AIs, the data is usually image banks of various techniques, sizes, details, etc. The AI ​​neural networks divide the images into multiple categories and learn more and more about the common elements that the objects have. This means when we ask our AI for a cup of coffee in a snow globe, it already knows what a cup of coffee looks like, its color, shape, etc., and also knows a snow globe, so using her knowledge, it will be able to create its own cup in a snow globe and make multiple iterations of it. 

Here we have another interesting element of AI. Two images will never be the same. If you don't get exactly what you want, but like what you got, you have to edit it yourself or find an AI art generator like DALL-E that allows you to make minimal changes to the image. Still, keep in mind that you lose the original image if you don't like the change. Even the most detailed prompts will have variations from one to another, you can ask for a style, a light, a technique, everything, but you will never be able to get two identical images. The platforms will also affect the results: Midjoirney, Dall-E, and Wonder, among others, are different platforms with different degrees of complexity. All these platforms were trained by different image banks, and no matter how much the same prompt is used on several platforms, you will never get the same results.

images of a cup of coffe inside a snow ball made by different AI

Images generated by different AIs in response to the prompt: "cup of coffee inside a snow globe."

What uses can we give to this technology?

After Allen's victory in the competition, artists began to worry about their work's future and to think that digital art was coming to replace them. However, other artists were already working with this type of AIs and training them themselves. For example, in 2018, the piece "Portrait of Edmond Belamy" was sold for 400.000 dollars; this work was one of the first made by an AI algorithm trained for that. Some artists train their own AIs to help them create art, and they do it with specific paintings and specific styles to generate particular art. Some of these works have reached astronomical prices in the art market and auction houses.

On the other hand, this technology makes it possible to democratize the use of images and give a hand to content creators who don't have much money to make incredible art to promote their work, such as album covers or posters for amateur shorts or movies.

Art generators AI can also help people who have never felt creative until now, create something that identifies them far beyond their abilities. Nothing is more pleasant than having something you feel you did or that you feel you participated in. From there, to call them artists is a long way off, but it can give new skills to someone who can't create art by their own.

Finally, this technology of generating images, texts, or whatever is not really a human being. It lacks the emotional qualities that give our books, songs, and drawings the emotion that makes us empathize with them. A perfect technique or a beautiful design is only the superficiality of the image that surprises for its mastery but not for its depth. As has happened with other technologies, human beings will end up adapting them to their needs. Art is likely to vary, and the way we perceive the beauty of a painting is expected to change. This technology will transform how we view design and art; still, it will be just one more tool in an arsenal of already existing technology.

Problems, questions, and censorship

As often happens with new inventions and technological advances, not everything is perfect, which is why several questions arise regarding copyright, legality, and censorship.

The most obvious question is copyright, whose image is it? Most companies that own the AI ​​allow the person who wrote the prompt to keep the rights to the work generated by the AI. However, for many people, the real owner of this is the AI since it is the one that makes the "creative process" and generates the image. If we finally get to the point of selling these works as illustrations for books and comics, we are going to have to really ask ourselves to whom those rights belong and how that is going to affect the creators of the AIs and the users of them.

Another important question is given by the training databases. It is a long-observed fact that AIs learn from the information that we generate. Therefore it is easy to find cultural and social bias in the images we generate. For example, on some platforms, the word 'Asian' generates mostly pornographic results, or the word 'nurse' only women, while 'doctor' presents us with men. This is worrying because these biases are found in image banks and other data that could be used to train AI to write books or movies, thus perpetuating imaginaries that society is desperately trying to change.

A third important question is censorship. Some of these apps limit words like death, blood, sex, etc., and prevent this content from being generated. However, others don't and can become very explicit. On the other hand, some AIs are not allowed to render the face of real people. But others do so with incredible accuracy. Considering that we already have problems with deep fakes, fake news, and Photoshop, imagine what could happen when AIs can pull hyper-realistic people into compromising actions. How would we deal with whether it's true or not? How would we regulate what can or cannot be created? Who would control the reality and veracity of the content we see?

Futuristic geisha with gold jewelry made by AI

“Futuristic, cyberpunk, geisha with gold jewelry" made with Wonder AI. prompt by Tatiana Botero

Opinions

There are hundreds of opinions on the internet about AI art, many conflicting. Most digital artists hate these technologies and feel they take away their work. In contrast, others see it as a new tool that expands their horizons and inspires them.

Tatiana Botero- Copywriter: "As a traditional cartoonist with limited skills, this is my chance to take my ideas to the next level and create an image I couldn't otherwise.

Learning to use the platforms takes time to achieve beautiful details and finishes. It's a team effort between the writer (you) and the machine. It's rare to achieve exactly what you want without iterating and rewriting the prompt many times, which makes the human being an active participant in the creative process.

I think this can help people create their own things and give a personal touch to their projects if they have no artistic abilities while giving digital artists new ideas and a new tool to work with and create." 

Carolina Lancheros- Designer: "From my position as a designer, It's a bit overwhelming to know that these technologies are quickly coming into and how easily they can replace us. Although this technology is capable of co-creating adaptations of different artists' work and thus reaching a final result, I did not obtain good quality in most applications, and in many attempts, the results were poor and with design errors.

I believe that human art is created from a totally different point of view than AI. Although our knowledge is limited, each person's perspective and knowledge affect their work and make it unique. The art created includes their personal history, culture, values, and unique way of seeing the world, which in my perspective, is more valuable than a drawing made without a soul.

Diana Pinto- Marketing Lead:Despite the fact that machine learning models can gather huge amounts of datasets, analyze them, and generate strokes, shapes, and combinations as humans, they'll never be fully autonomous. 

I think creativity comes from neurological connections enhanced by external interactions; human society functions like a collective brain. Having the capacity to learn, I believe AI can mimic our mind's ability to come up with new ideas for creating any kind of art. 

It takes time for machines to back up humankind's experience and transform it into art, but it's possible! 

Anyway, they'll need new inputs and instructions to mold outcomes. Those inputs have to come from humans. So, I think technology could evolve to help artists save time and effort, but it is up to them to choose how and for what purpose those art pieces will be used.”

Diego Barrera- UX/UI Design Lead: “I discovered those when one of the tech YouTubers I follow was doing a live and generating images with prompts the people proposed. I was fascinated because one of my hobbies is creating 3D images, so I am aware of the work involved in getting good results. To think that I only need one sentence with this technology is still beyond my understanding. 

It is also true that now creativity must be in words. Creating a prompt conveying everything you want from the AI is difficult. Most of the time, the AI will generate misshapen beings with infinite limbs and disturbing patterns. 

In the future, this technology will help create digital products, like designing user interfaces. UI designers don't make art. Instead, we make decisions based on research and analysis. Every time we build a feature, we first analyze what people use and how they use it. This can be done by artificial intelligence in seconds. In fact, this is something a machine could do much better than art. I suppose it is a matter of time before this technology reaches other industries.”

Astronaut drinking coffee inside a cinema made by Dalle E

“Astronaut drinking coffee inside a cinema” made with Dall E. Prompt by: Diana Pinto and Carolina Lancheros

The future of creative AIs

At this moment, we are facing a turning point for humanity. We have created AIs that do all that humans do but faster. As much as we are fascinated by what we are achieving, it is worth asking ourselves what will happen in the future. What will happen when all "creative" work doesn't need a human being?

A few years ago, we thought that robots would replace physical work. It was not worth carrying bricks if a machine could do it faster and was stronger than us. Meanwhile, humans were going to dedicate themselves to science, music, art, and writing. However, in our desire to find the limit of AIs, we gave them the ability to pursue that as well. Right now, we have AIs writing articles for the world's leading newspapers, AIs making award-winning artwork, and even AIs writing songs, books, and movies. Some of these still do not achieve adequate results, but it is a matter of years or even months for them to do so.

What else can they create? If we give AIs the ability to create art and culture, it would be much easier to create technology. We have already simplified development processes to such an extent that human beings without experience or knowledge can develop a professional and complex application in a short time. What is going to happen the day that these AIs also obtain this knowledge, and we only have to write a couple of lines with our specifications and have the website of our business, with copy and design? Can you imagine writing just a general explanation and having an infrastructure of an application like Tinder, Facebook, or Instagram in a few minutes? Perhaps that is not something so simple that will be archived in a couple of weeks from now, but what is possible and not very far away is the generation of code and specific functionalities, for example: "A login page with Facebook, Google, and Twitter in javascript" could become a prompt for an AI and it would give you a clean code that you just have to copy and paste on your app for it to work.

This could be the next step for technology. Their ability to learn and solve problems with just one line of text could take us to the point where we don't even need developers anymore because the machines will be able to create themselves by generating the code, the images, the texts, and the design of digital products that can take human months in just hours or minutes.