The Challenges with Generative AI (part 1)

Matt White
3 min readJan 7, 2023

--

Image generated by Midjourney (prompt: a philosopher contemplating ethics under an apple tree.)

Generative AI introduces a lot of ethical and legal challenges. Here are only a few of those challenges that we will need to consider as we advance this field of research. I will post a more comprehensive post shortly.

Bias

  • Data sets are generally English language and based on Western values, perceptions, and culture. For instance, LAION 5B which is the dataset used to train Stable Diffusion and is used by the majority of applications on the market for text to image synthesis including apps like Lensa, is based on western values of gender, beauty, race representation and so forth. Which means the models that train on this data are naturally biased, so a prompt like “a beautiful woman” most often generates a young white or light skinned woman that matches the Western perception of beauty.
  • Large language models like ChatGPT are trained on bias data and their safeguards have been shown to be easily circumvented. A UC Berkeley professor was able to evoke bias through asking the model to produce code to check to see if someone would be a good scientist based on their race and gender. The code output by the model revealed that a good scientist in its measure was “white” and “male”.

Transparency

  • Most generative AI models are not released to the public and remain blackboxes with no details about the datasets they were trained on, the internals of the models and its associated weights and biases. This lack of transparency creates a huge issue as generative models adopt human biases reflected in the data they are trained on.
  • Eventually legislation will need to be developed to govern this area and provide consumers with protections and companies with accountability for the content their models produce. We are already seeing issues with ChatGPT “prompt hacking” where users can get around the controls put in place to limit bias and harm by using hypotheticals and extracting biases from the model.

Choice

  • Generative AI models produce a limited set of results, if not only 1 result. This is very prescriptive, and these models are limited to sampling from their latent space which itself has a limited set of possible results. So, for a ChatGPT query you get one answer, you ask the question again, you get the same answer. You can perform prompt engineering to bias the results, but then you are leading the model to produce more favorable and biased results. The same is an issue with images, you can generate images from a text prompt, and you are returned results, let’s say 4 possible images, then you can select one and refine it. However, you are guided by the model not guided by exactly what you are looking for. This lack of choice presents issues of bias and influence controlled by the developers of the model used, which is almost always closed-source and not highly tunable.

Truth and Trust

  • Large language models like ChatGPT can respond authoritatively and be absolutely wrong. Results can be interpreted as being a source of truth by the uninformed user. There are dangers to models providing results with a high level of confidence and being categorically wrong.

Legal & Ethics

  • Many concerns have been raised about possible copyright infringements for generative AI art, text, and code. The class action lawsuit brought against Microsoft for copilot will set a valuable precedence in the courts for which other lawsuits may be filed.
  • Artists, authors, and developers will need a mechanism to opt-out of their art being used as training data for large models, as such you will see the introduction of platforms that validate authenticity of digital content and others that prohibit scraping of content.
  • Professors are already seeing students submitting papers that were in whole or partially generated by ChatGPT, which does not cite its sources or provide references and instead presents materials as though they were its original work. This will continue to create an issue for teachers and new platforms that can detect generated content will hit the market.
  • The legal system always trails behind technology, but it is important that legislation be passed to protect original authors, artists, and creator’s works from being exploited and protects people from the misuse of generative AI such as deep fakes. Equally important will be enacting policy to mitigate applications of generative AI that pose a personal, domestic, or national security threat.

--

--

Matt White
Matt White

Written by Matt White

AI Researcher | Educator | Strategist | Author | Consultant | Founder | Linux Foundation, PyTorch Foundation, Generative AI Commons, UC Berkeley

Responses (2)