Top 10 Generative AI Innovations in 2022
In no particular order (because they are all great)
6 min readJan 7, 2023
Here are my picks for the top 10 generative AI innovations in 2022 with details about the innovation including model function, details and links to paper and code. They include products and services, research papers and even an ethical framework.
DeepMind AlphaCode
- Type: Software
- Area: Code Generation
- Model: Transformer-based large language model
- Parameters: 300M, 1B, 3B, 9B and 41 billion
- Released: 02/2022
- Source Released: No
- Link: https://alphacode.deepmind.com/
- Summary: AlphaCode is a large language model capable of producing software to solve problems at the skill level of a junior software developer. This technology will evolve in 2023 and is an area to watch out for as it will improve drastically and be a source of concern for software developers, be an area of interest for business leaders who want to reduce OpEx and will be an area of legal battles since models in this space are trained on copyrighted code. A similar platform developed by OpenAI called Codex for Github called Copilot which assists developers in writing code is currently the subject of a class action lawsuit.
Open AI ChatGPT
- Type: Service
- Area: Conversational Chatbot
- Model: Transformer-based Conversational Chatbot with Reinforcement Learning (and human labelers)
- Parameters: 175 billion
- Released: 11/2022
- Source Released: No
- Link: https://openai.com/blog/chatgpt/
- Summary: ChatGPT is a sibling model to InstructGPT, which are both pre-trained on GPT 3.5. ChatGPT is fine-tuned with supervised and reinforcement learning (using human rankers and labelers for real prompts) and trained on conversational datasets to provide a more interactive dialog. ChatGPT is currently in beta and the source of both much excitement and concern. It is trained on publicly available Internet sourced text and as such reflects English-language content. It generates very plausible answers but can also be wildly incorrect. Any answers it responds with that seem to be creative are in fact just the results of training on existing content and it cannot provide answers for anything that did not exist prior to 2022.
Open AI Dall-E 2
- Type: Service
- Area: Image Generation
- Model: Text-to-Image Diffusion Model
- Parameters: 3.5 billion
- Released: 02/2022
- Source Released: No
- Link: https://openai.com/dall-e-2/
- Summary: Dall-E 2 is an improved text-to-image model that uses text prompts to generate novel images. It has given rise to the art of prompt engineering where users craft text descriptions that can yield impressive visual results. The model is trained on image-text pairs for images scraped from the Internet. The images are provided to the user and the user subsequently owns the generated content and is responsible for its use.
Open AI GPT-3.5 Family
- Type: Service
- Area: Coding/Writing/Chat
- Model: Autoregressive Transformer Large Language Model
- Parameters: 175 billion
- Released: 01/2022
- Source Released: No
- Link: None
- Summary: GPT-3.5 is a family of large language models that have been fine tuned for specific tasks and are based on GPT-3. This includes “code-davinci-002” which is a base model for code completion. “text-davinci-002” and “text-davinci-003” are models that are used for text completion. ChatGPT is also a model from this class, and although ChatGPT has attracted the most interest, GPT-3 and 3.5 fine-tuned models were a significant advance in the field of generative large language models before ChatGPT garnered wide-spread interest.
Google Imagen Video
- Type: Paper Only
- Area: Video Generation
- Model: Text-to-Image Video Diffusion Model
- Parameters: 41.4 billion
- Released: 02/2022
- Source Released: No
- Link: https://imagen.research.google/video/
- Summary: This model developed by Google Research was not published or made into a service, presumably because it is not ready for prime time. The paper claims that it can produce high resolution 128x1280x768 24 FPS video producing a 5.3 second video. The prepared outputs are similar Meta’s Make-a-video but in higher resolution with improved temporal consistency.
DreamFusion
- Type: Paper Only
- Area: 3D Model Generation
- Model: Text-to-3D Diffusion Model
- Parameters: 20 billion (Imagen)
- Released: 10/2022
- Source Released: No
- Link: https://dreamfusion3d.github.io/
- Summary: There were a few papers released in 2022 on creating 3D assets from text prompts including Nvidia’s Get3D and Magic3D and OpenAI’s Point-E. DreamFusion allows a user to enter a text prompt which is run through Google’s Imagen model then through Instant NeRF to create a 3D model complete with normals, mesh, textures and lighting. Work in this area will improve over 2023 yielding viable results that can be produced in much less time than the current 40 minutes or more to produce output models.
Stability AI’s Stable Diffusion
- Type: Source Code
- Area: Image Generation
- Model: Text-to-Image Diffusion Model
- Parameters: 890 million
- Released: 08/2022
- Source Released: Yes
- Link: https://github.com/CompVis/stable-diffusion/blob/main/README.md
- Summary: Stability AI shook up the industry by being the first to release source code for a generative AI model. Big tech has guarded their model source code, and many do not even offer a public service with their generative models. Stable Diffusion has been adopted for a myriad of different applications beyond just generating novel images, including generating style transferred selfies, generating 3D images, and creating short videos. The accessibility of Stable Diffusion with its lightweight model that can be run on most any modern GPU system has put generative AI in the hands of millions of developers and creators. It would be great to see open-source contributions made in the area of large language models and some of the larger text-to-image models, but it seems unlikely.
Meta Labs Make-A-Video
- Type: Paper Without Code
- Area: Video Generation
- Model: Text-to-Video Diffusion Model
- Parameters: Not published
- Released: 09/2022
- Source Released: No
- Link: https://makeavideo.studio/
- Summary: Researchers from Meta AI were able to produce somewhat temporary consistent videos from text input prompts. The videos are short only lasting around 10 seconds and rife with artifacts and visual inconsistencies. However, the work is an important step in being able to generate photorealistic videos from text and voice prompts. This work is in its infancy and the area of generative video presents many challenges around temporal consistency that do not exist in the generation of 2 dimensional images. Future work will create even more promising results in 2023. The model makes use of text-to-image (T2I) along with spatial-temporal modules to create plausible video.
OpenAI Whisper
- Type: Service
- Area: Speech Recognition
- Model: Encoder-Decoder Transformer Model
- Parameters: 1.6 billion
- Released: 10/2022
- Source Released: No
- Link: https://openai.com/blog/whisper/
- Summary: The model can transcribe and translate speech audio from 97 different languages and can work with very noisy audio to isolate and accurately capture speech. The model is trained on 680,000 hours of audio data collected from the web and has impressive zero-shot (not having seen the audio previously) performance. Although speech-to-text is not a form of generative AI, the work is important in being able to enable human-like chatbots, language translation and can be applied to speech synthesis.
AI Bill of Rights
- Type: Blueprint (not proposed legislation)
- Area: Governance
- Model: Good old-fashioned text
- Parameters: 73 pages
- Released: 10/2022
- Source Released: Yes
- Link: https://www.whitehouse.gov/ostp/ai-bill-of-rights/
- Summary: The AI bill of Rights is a non-legal framework that outlines “five principles and associated practices to help guide the design, use and development of automated systems that protect the rights of the American public in the age of artificial intelligence”. This framework is precursor to eventual legislation that will be required to govern the development and use of artificial intelligence and generative AI will continue to push forward the need for a legal framework to govern the use of AI.