# Understanding the Challenges of AI Bias in Data Sets
Written on
Chapter 1: The Fundamental Issues with AI Bias
The discussion around artificial intelligence (AI) bias often raises significant questions about the nature of reality and data. Consider the philosophical scenario: If a tree falls in a deserted forest, does it produce sound? Scientists affirm that sound waves are generated, while sociologists argue that sound requires a listener. Psychologists suggest that human perception defines reality, emphasizing that understanding is contingent upon human presence.
This inquiry directly relates to AI as the foundation of artificial intelligence relies heavily on data generated by human interactions with their environments. Unfortunately, data scientists are finding that the data used to train AI systems can be incomplete or even deceptive, leading to inherent biases in the algorithms.
However, I contend there exists a deeper issue that arises from neglecting the socio-economic contexts surrounding these data sets. As a scientist, I recognize the paramount importance of data integrity. Researchers undertake meticulous efforts to ensure that their data sets accurately validate their hypotheses. This involves designing experiments where variables are carefully defined and controlled to eliminate unpredictable factors. Controlled conditions are essential for robust scientific inquiry.
The challenge with much of the social data being gathered and analyzed today is that the systems responsible for data collection often lack means for validation. They may simply choose not to validate the information gathered.
Mark Fielden, in his research on data validation, highlights this issue:
“Unfortunately, computer systems cannot ‘make valid’ or ‘ratify’ data on their own. But they can ‘confirm’ decisions made by an operator. The operator is key to ensuring the data are correct…”
This observation marks the inception of the problem. Scientists fully appreciate the significance of context in interpreting and validating data. Despite our progress in artificial intelligence, we have yet to program machines to grasp contextual nuances.
While we can achieve high accuracy in predictions using data, we continually struggle to account for all variables in any given scenario. Humans, with their perceptual and intuitive capabilities, excel in understanding context.
In the extensive research conducted by Yun Zeng on what he terms “context-aware machine learning,” he posits that the solution lies in utilizing neural networks. However, this journey begins with recognizing our own biases, prejudices, and limited perspectives that obscure our vision. In essence, our identities shape the technologies we create.
I propose that the solution lies in adopting a peer-review system for machine learning, akin to the practices established by scientists over centuries. What distinguishes this new system is the broadened definition of “peer.” Data scientists must acknowledge their duty to share their findings beyond their immediate circles, fostering an environment where context can emerge and reveal what we might otherwise overlook.
This approach is the key to addressing the challenges identified by Mark Fielden in his data validation study.
Ultimately, AI’s future relies heavily on human involvement.
References:
- Zeng, Yun. “Context Aware Machine Learning.” Statistics 2019.1901 (2019).
- Fielden, Mark. “Data Validation.” Industrial Management & Data Systems 90.4 (1990): 3–5.
Chapter 2: Expanding the Role of Context in AI
The integration of a robust peer-review system within AI development is crucial for mitigating bias and enhancing the reliability of machine learning outputs.