Skip to Main Content
Notre Dame 5 Star University Logo
University Library Logo

Research

Fairness


Built-in Prejudices

All AI models have biases from the data they are trained on. It's important to recognise these biases and decide which ones need to be addressed. Research has shown that popular Gen-AI models have racial and gender biases in medical situations (Zack et al., 2024). However, some studies suggest that Gen-AI performs at a level similar to doctors, and its performance does not significantly change based on race or ethnicity. This highlights that the bias issue is complex and needs to be carefully considered.

AI systems can reflect and amplify human biases from their training data unless they are actively designed to avoid this. Most models represent the dominant culture and language of their data, as well as the viewpoints of their creators. 

Where the Data Comes From

AI models need large amounts of data for training. Because these datasets are so big, it's hard to verify that all data is properly licensed and ethically sourced. Major AI companies like OpenAI (ChatGPT) and Google do not fully disclose their training data sources, making it difficult to know what is included.

If an AI tool accidentally reproduces unlicensed content, you could face legal problems. Additionally, parts of AI development - like data labelling and content filtering may involve unfair labour practices in different countries, raising ethical concerns about how these technologies are made.

Zack, T., Lehman, E., Suzgun, M., Rodriguez, J. A., Celi, L. A., Gichoya, J., Jurafsky, D., Szolovits, P., Bates, D. W., Abdulnour, R.-E. E., Butte, A. J., & Alsentzer, E. (2024). Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. The Lancet. Digital Health, 6(1), e12–e22. https://doi.org/10.1016/S2589-7500(23)00225-X

Sensitive Data

One of the critical aspects of using Generative AI (Gen-AI) in research is the responsible management of sensitive data. Australian researchers work within a strict regulatory framework designed to protect personal information and intellectual property.

Defining and Classifying Sensitive Data

Before engaging with Gen-AI tools, researchers must carefully identify and classify their data. Sensitive data can include:

  • Personally Identifiable Information (PII): Any data that can directly or indirectly identify an individual (e.g., names, addresses, dates of birth, biometric data).
  • Health Information: Medical records, diagnoses, and any data related to an individual's physical or mental health.
  • Indigenous Cultural and Intellectual Property (ICIP): Data concerning Indigenous communities, their knowledge, cultural expressions, and resources, which requires particular sensitivity and community consent.
  • Confidential Research Data: Unpublished findings, proprietary information, or data subject to non-disclosure agreements.

    Never Upload Sensitive Data to GenAI Tools!

Strategies for Secure Data Handling with Gen-AI

Implementing robust data governance strategies is essential to mitigate the risks associated with Gen-AI.

  • Data Minimisation: Use the minimum amount of data necessary for your research. Avoid uploading unrelated or superfluous data to Gen-AI systems.
  • Anonymisation and De-identification: Where feasible, apply robust de-identification techniques to sensitive data before using it with AI. Consider using synthetic data for model development or testing if it can adequately serve the research purpose without compromising real data.
  • Secure Storage and Access Controls: Implement strong encryption for data at rest and in transit. Restrict access to sensitive datasets to authorised personnel only and monitor access logs.
  • Privacy by Design: Integrate privacy considerations into every stage of the research project, from design to deployment. Conduct Privacy Impact Assessments (PIAs) to identify and mitigate privacy risks.
  • Data Retention Policies: Define clear data retention periods and ensure secure disposal of data once it is no longer needed.

Further Reading:

Tools and Training