How did we get to where we are today in the field of generative AI?
Generative AI will be the most disruptive technological innovation since the advent of the personal computer and the inception of the Internet with the potential to create 10s of millions of new jobs, permanently alter the way we work, fuel the creator economy, and displace or augment 100s of millions of workers in roles from computer programmers to computer graphics artists, photographers, video editors, digital marketers and yes, even journalists. Even with all the hype around generative AI this year, it’s true power has not yet been seen or felt, in 2023 there will be significant innovations that will begin a revolution that will leave no industry or job function un-impacted in one way or another.
Generative AI research can trace its history back to the 1960s. However generative AI began to develop into something similar to its current form in 2006, with the first significant paper in the field, Geoffrey Hinton and his co-author’s “A Fast Learning Algorithm for Deep Belief Nets” which re-introduced Restricted Boltzmann Machines in the context of deep learning (he originally introduced the RBM concept in 1983.)
However few innovations took place in the field, until in 2014, with the introduction of GANs by Ian Goodfellow and his colleagues. Generative AI developments in research were made in the following years, most significantly the introduction of the transformer architecture for natural language processing applications, presented in the paper “Attention is all you Need” by Vaswani and colleagues from Google.
Although most people will admit that they were not aware of generative AI until 2022. This is when the technology was put into the hands of consumers with the release of several text-to-image model services like MidJourney, Dall-E 2, Imagen, and the open-source release of Stability AI’s Stable Diffusion. This was quickly followed up by OpenAI’s ChatGPT which mesmerized consumers with a version of GPT-3 re-trained on conversational dialog that seemingly had an answer for everything and delivered responses in a very human-like manner. At the same time VCs looking for the hot new technology to invest in caught the generative AI bug and both Stability AI and Jasper both became instant unicorns with Series A funding exceeding $100 million. GitHub’s Copilot also saw widespread adoption, a tool built on Open AI’s Codex which was trained on all public code repositories in Github and assists developers by converting natural language into executable software code.
However, there has been significant backlash against generative AI. Many concerns have been raised about possible copyright infringements for generative AI art, text, and code as well as the impact to creative jobs. A class action lawsuit brought against Microsoft for Copilot will set a valuable precedence in the courts for which other lawsuits may be filed as many developers contend their intellectual property has been stolen. Artists, authors, and developers want to have their work excluded from the wide-scale scraping performed to create viable datasets for the large language and image models and artists on ArtStation have revolted requesting that all AI-generated art be banned from the platform.
Generative AI has been an active area of research since the 1960s when Joseph Weizenbaum developed the first chatbot named ELIZA. It was one of the first examples of Natural Language Processing (NLP) and was designed to simulate conversations with a human user by generating responses based on the text it received. Although the system was a primitive rules-based implementation intended to synthesize a human conversation, it paved the way for further developments over the coming decades in the field of NLP.
Modern generative AI is built on deep learning, which can trace its beginnings back to the 1950s. Deep learning innovations were fairly quiet for decades then saw a resurgence in the 80s and 90s with the advent of artificial neural networks (ANNs) and backpropagation algorithms. By the 2000s and 2010s the amount of data available and computational capabilities improved to the point where it would make deep learning practical. It was in 2012 that Geoffrey Hinton and his team made a breakthrough in the field of speech recognition using convolutional neural networks (CNNs) and then again in 2014 he and his team made a similar breakthrough in the field of image classification, which paved the way for major subsequent innovations in the study of artificial intelligence.
In 2014 Ian Goodfellow released his seminal paper on Generative Adversarial Networks (GANs) which positioned two networks against each other in a zero-sum game to create viable novel images that were similar in appearance to the images the model was trained on, but not the same. This work led to incremental developments in the GAN architecture which yielded increasingly better results in image synthesis over the following years, and the same methods began to be applied to new applications like music composition. New model architectures were developed in the form of convolutional and recurrent neural networks (text generation, video), long short-term memory (LSTM) (text generation), transformers (text generation), variational auto encoders (VAEs) (image generation), diffusion models (image generation) and various flow model architectures (audio, image, video.) Additional adjacent work in generative AI produced neural radiance fields (NeRF) which can construct 3D scenes and assets out of 2D images and reinforcement learning which uses simulations to train an agent through reward-based trial and error.
Although significant achievements in the space have been realized in recent years, including the generation of photorealistic images, viable deepfake videos, believable audio synthesis and human-like generative text produced by large language models like OpenAI’s GPT-1 it was not until the second half of 2022 with the release of a number of diffusion-based image services (MidJourney, Dall-E 2, Stable Diffusion), the release of OpenAI’s ChatGPT and the peppering of various text-to-video (Make-a-Video, Imagen Video) and text-to-3D (DreamFusion, Magic3D & Get3D) papers that the media and the mainstream took notice.
We are still in the infancy of generative AI, it is currently a novelty for consumers and businesses, but it will soon find its way into products, services, processes and all facets of business and our daily lives as it becomes a technological enabler for creating content and improving productivity. The jobs market will be heavily impacted as generative AI can not only augment or automate current creative functions of jobs but can be used to entirely replace certain job functions, making those jobs irrelevant.
The impact of generative AI will no doubt be a substantial topic in 2023. The use cases of generating image, text, code, audio, music, video, and 3D models we have seen thus far are just the tip of the iceberg, expect more innovations to come in 2023 along with further backlash from communities whose jobs will be affected by the commoditization of generative AI.