On the Distinctions between AI Works and Art Works
Note: This essay was originally written for an audience who has undergone considerate education and training in mathematics, statistics and computer science. To accommodate a broader audience with diverse backgrounds and to preserve the flow of the original writing, mathematical terms and concepts are complemented with external sources (through embedded hyperlinks) for explanations. These external sources are intentionally not academic by nature and were not part of the references of the original essay.
The recent groundbreaking advancements in artificial intelligence and machine learning have brought forth works in image, video, and literature that would seem comparable to the works of human artists. It is the purpose of this essay to draw the seemingly blurred lines between artworks and AI works, and in doing so, attempt a better understanding of the inexplicable nature of art and art making.
Art is an expression by the creator that reflects, challenges, records, invokes, and envisions. Every artwork spans temporally — it builds on top of and/or challenges the past, records the instantaneous thinking and emotions of the creator in the present (the present when it is created), and invokes the thinking and emotions of the audience in their present (at the present when it is engaged), and envisions a possibility of the future — a future that includes the existence of the artwork.
It seems apt to follow (an attempt of) a definition with an example, and we shall seek one on the ceiling of the Sistine Chapel. Undoubtedly, Michelangelo drew enormous insights from the past. The contents of the fresco are various events in history (history according to the Catholic church, but history nonetheless). The skills and style of the work were broadly based on the movement later known as the Renaissance rooted in Florence, where he grew up, dated back decades, challenging the aesthetics of the Middle Ages that spanned centuries before that. The artist’s own past, in which he worked mostly as a marble sculptor, was of equal significance as it brought technical difficulties to large-scale painting but also laid a unique foundation for his depictions of muscles, motions, and momentums . The impact of the work in the future is beyond obvious as well: not only did it strengthen Michelangelo’s own ties with the court, and in particular the papacy for decades to come, it marks one of the few pinnacles of arguably the most colorful era in Western art history with lasting influence on future generations of artists, art lovers, and art history students who have to write tedious essays on it.
But it is the present, or rather the presents, that are of the utmost importance. It was a long “present” for Michelangelo that lasted five years. It was in this present that he created, from nothing to everything, from zero to one, from false to true. There are nine scenes that compose the work on the ceiling and Michelangelo didn’t paint every inch, with help and support from a group of compatriots and apprentices . But every inch was his art, and his art lies in every inch. It might seem tempting to ask the question: is it still the same great work of art if some minor change were to take place — indeed, nothing drastic like putting a beard, or a leather jacket on Adam, but perhaps a slightly different hue of grey for the rock on the background or a minute change of hairstyle for one of the wingless angels in the corner? To generalize: is art still art with an ever slight perturbation? What is the condition number of art? The answer is no, and infinity. And the proof is a simple one. Every possibility of an alternation corresponds to a decision made, either explicitly or implicitly, by the creator. Any deviation from the set of such decisions is a deviation from the creator’s intention. A deviation of any scale is enough to break the fragile injection from intention to expression.
It is in the creators’ present where we make the first distinction between art works and AI works. Artificial intelligence is a promise in the asymptotes. Taking away true infinity (which is never there to begin with), the promise collapses to an approximation. It is worth emphasizing that infinity is not required in the past or future parts of the work. Indeed, modern AI’s ability to take in the past is beyond human imagination, and its legacy in the future requires little effort in reservation. It is the present that the AI lacks. The wonderful minds behind DALL-E 2 described the method they used as a “generative stack” that is composed of a prior probability of CLIP image embeddings conditioned on text captions, and a decoder that produces images conditioned on the CLIP image embeddings . The CLIP embedding space allows for what is known as “zero-shot” learning where the AI is working with disjoint training and testing data, with auxiliary information linking the two. While this framework has demonstrated great performance in image generation based on textual prompts, it is based on the “common sense” the machine creates through trial and error — by minimizing some form of a loss function that (hopefully) converges to zero in the asymptotes. There isn’t a moment where the loss is exactly at zero — the moment we call the creator’s present, where all decisions are made and the creation takes place. It is rumored that Michelangelo, when asked how he created the marble sculpture of David, said he could see the sculpture in the rock and just proceeded to chisel away everything else. If this rumor was true, the “present” of Michelangelo’s David was the moment he “saw” it. In contrast, AI cannot “see” the end product, but instead, start with chiseling and “look as it goes”. In other words, an artist seems to be standing on the integer line where she can go from zero to one with one step, but the machine, any machine, is on the real line — the finer the grid, the more computational power required, and the closer, it seems, it can get to one. But the solution is not in approaching continuity, as ”one” here is not a real number, not an integer even, but rather a boolean. Therefore any effort on the real line is a deception of progress — in the receding of the loss, the present never arrives. The lack of the creator’s present brings doubt, hesitation, and confusion to the audience’s presents. When engaging a piece of art, the thinking and emotions invoked in the audience are a composition of the direct experience of the work and exploration of the creator. The latter can be viewed under a Bayesian framework. The true intention of the creator can be seen as an unknown random variable onto which a prior distribution can be cast based on the knowledge and information of the creator. A likelihood function can be imposed to link intentions with expressions of existing works from the same creator. Consequently, a posterior can be achieved to speculate on the creator’s intention behind the current work, conditioning on the expression — the work itself. Perhaps the most vital component in this framework is its assumption: that the intention is indeed a random variable following an albeit unknown but existent distribution. This is not the case with an AI creator — it has no intention. And such knowledge is available to the audience. Whether it was the elaborate fresco on the ceiling of the Sistine Chapel, or the compression-friendly color blocks by Piet Mondrian, the expressions are certain. The audience takes such certainty as given and works from it to achieve their own analysis and judgment of the posterior. This cannot be done on an AI created work, where not every brushstroke was intentional — indeed, not any brushstroke was intentional. The audience’s presents are lacking too.
A medium is a necessary condition for the existence of art, not unlike a body for life, a receiver for radio. As art in essence is the giving of an expression, its medium ties to the receiving senses (of the audience)— what we see, hear, smell, taste, and touch. Literature, paintings, photography, music, culinary experience, etc. all fall into these categories. It is important to emphasize again that art has a direction — from intention to expression, from the creator to the audience, from the creator’s present to the audience’s presents (the moments they engage with the art, which are, trivially, after the creator’s present). However, the medium, and the various skills, and methods associated are without directions. In direction, we make the second distinction between art works and AI works.
In Shan Shui Lun (Discourse on Landscape Painting), Wang Wei, a brilliant poet and painter in the Chinese Tang Dynasty, gives a succinct yet comprehensive guide to the methodology of landscape painting. In roughly only five hundred words, Wang Wei lays the framework that includes perspective — “丈山尺树... 远水无波，高与云齐 (mountains are of ten feet, trees of a foot... distant waters have no waves and rise to blend with the clouds)”, focus — “定宾主之朝揖... 多则乱，少则慢... (determine the hierarchy of ‘guests’ and ‘host’... if too many, they are confused, if too few, they are lackadaisical…)”, and even goes on to explain the “shortcuts” to depict various scenarios: “有风无雨... 有雨无风... 早景则... 晚景则... 春景则... (when there is rain... when there is wind without rain... where there is rain without wind... if it is a morning scene... if it is an evening scene... if it is a spring scene…)” But in front of all these techniques in order and in importance are the words: “凡画山水，意在笔先 (When one paints landscapes, concept precedes brushwork)” . “Concept” here is the translation of 意 (yi), which also has the meaning of intention, mindfulness, and meaning. Indeed, intention before expression, and that direction generalizes beyond ancient Chinese landscapes. Without this sentence, all the techniques given are measurements of correlation. A scene of the morning is likely correlated with hope and liveliness, and a drawing of “千山欲晓 (a thousand mountains about to be dawn-brightened)” is likely to be correlated with those motifs in one’s mind. However, no information on causation is given — no direction. The essence of the morning scene might be best captured on the mountain tops and the “朦胧残月 (pale and dim remaining moon)”, but it shall be the painter’s decision to pick the morning scene. It is a common thinking among the ancient Chinese scholars that the object (often landscapes or buildings) are themselves constant and without emotions but reflects the different emotions and insights of different viewers at different times. The same comparison can be made between the medium and the creator’s intention. The former is a vehicle of the latter. The vehicle has no direction, the driver does. It is interesting to note that many of the techniques Wang Wei detailed are born from our shrewd observation of nature. It is necessary to know where to look in order to know where to paint. It would not be surprising that an AI can develop a similar set of skills. Fed with enough landscape paintings, an AI shall quickly learn for itself the various “shortcuts”. It might even be able to learn the high correlations between portrayals of objects and emotions. Indeed, “Hierarchical Text-Conditional Image Generation with CLIP Latents” in its entirety is a teacher’s note on how to best teach an AI student to write “A Discourse on ____ Paintings”. DALL-E 2’s or any other incoming AI’s discourse will be a bit lengthier than 500 words, but they always start at the second sentence — after “When one paints ___, concept precedes brushwork”.
We have so far discussed the “what” and the “how” of art making, and we shall proceed to discuss the “why”. The benchmark against which any artwork is evaluated, both by its creator and its audience, is the absence of art. This is not to ignore the constraints artists usually face in the forms of commission, employment, or livelihood. For example, it would hardly be possible for Michelangelo to turn down the offer from Pope Julius II to paint the fresco. In fact, Michelangelo did try to refuse — claiming that “(painting) non era mia arte” . However, most artists are skilled craftsmen in the mediums of their choices (or in Michelangelo’s case here, can learn to become one), and it is their skills and the quality of their crafts that are demanded, and correspondingly evaluated by the commissioners. The artistic value, a totally subjective measure, is not and cannot be demanded or evaluated. With this possible ambiguity out of the way, we can safely assert that any art is born from the freedom of not doing art. It is in this freedom of choosing the alternative we find the third distinction between artworks and AI works.
Much like many artists living by commissions, an AI works under constraints. In the case of DALL-E 2, the constraints are explicit in the textual prompt given. The prior and decoder trained on the seen classes can be understood as the AI’s process of understanding and analyzing the prompt, and the generative model is its response. If we are to assume, by contradiction, that there exists a result from such generative model that is of artistic value, it follows that the artistic value is acquired after certain decrease in the loss function. At the precise epoch where such artistic value would be “learned”, the AI is not given the alternative choice, the trivially true scenario — that of forgoing the artistic value altogether. The lack of such a choice is evident in the structure of the loss function. The contradiction is thereby reached. A popular analogy for artists working with constraints is that of “to dance with shackles on”, the implicit alternative is “to sit still (do nothing) with shackles on". If we see the loss function as a form of punishment that an ML model strives to minimize, what AI does is, comparatively, “doing prison chores with shackles on”, where “sitting still” is never an option.
The purpose of this essay is not to bash the marvelous progress made in the field of AI, but rather to attempt a better understanding of what composes art — what exactly takes place in the “creator’s present”. To this end, the advancements in machine learning technologies are of considerate help, in that it decomposes all the components that are “orthogonal” to art making. Indeed, AI has shown that art is not a limiting approximation, not achieved by solely honing one’s skills on the subject matter, and certainly not done because of punishment minimization. We shall look at the moment where the boolean value is switched from zero to one with reverence and awe, the same way a dot looks at a line, a circle at a sphere, an index of a for loop at a variable initialized outside. Plato accredits humans’ ability to do “zero-shot” learning to our souls’ experience with the “perfect forms in heaven” which is constant, long before our bodies exist . Plato simply points outside the for loop that is this physical world with all the marvelous AIs in it. It is out there the seed of art is conceived.
1. Reeve, C. D. C., 1997, Plato, Cratylus: translated with introduction and notes, Indianapolis and Cambridge: Hackett; reprinted in J.M. Cooper. (ed.) Plato, Complete Works, Indianapolis and Cambridge: Hackett.
2. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text- Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125. Retrieved from https://arxiv.org/abs/2204.06125
3. Wallace, William E.. Michelangelo : The Artist, the Man and His Times, Cambridge University Press, 2011.
4. Wang Wei. Shan Shui Lun (Discourse on Landscape Painting). Translation will be provided as excerpt from the book: Early Chinese Texts on Painting (Cambridge: Harvard University Press, 1985), 173–76.)