When Machines Mistake Coincidence for Cause: The Hidden Flaws in Most of Today’s AI | Google certification | Generative ai use cases in healthcare examples | Visual generative ai tools | Turtles AI
On a rainy afternoon, an artificial intelligence system confidently declares that buying umbrellas causes rainfall. While it may sound absurd, this scenario reflects a significant issue in modern AI: the confusion between correlation and causation. This misunderstanding isn’t just theoretical—it has tangible consequences that can impact healthcare decisions, criminal justice, and daily life.
After reading the article, don’t forget to listen to our first ever podcast video, at the bottom of this page.
AI has seamlessly integrated into many aspects of our lives, from recommending movies to assisting in medical diagnoses. These systems often rely on identifying patterns and correlations within vast datasets to make predictions and decisions. However, without understanding the underlying causal relationships, AI can draw misleading or even harmful conclusions. The same goes for LLMs like ChatGPT, which heavily rely on statistical occurrences to "create" content. In this essay, we’ll explore causal correlation in AI and provide you with the tools to avoid being deceived.
The Pitfall of Correlation Without Causation – Why Current LLMs Struggle with True Causality
Consider an AI system analyzing hospital data that observes a strong correlation between the number of hospital beds and the mortality rate. It might deduce that reducing the number of beds will lower deaths, not recognizing that hospitals with more beds tipically treat more severe cases, which inherently carry higher risks. The AI misses the causal link that the severity of illnesses leads to both more hospital beds and higher mortality rates.
In another instance, an AI examining educational outcomes notices that students who wear glasses tend to have better grades. It might conclude that wearing glasses improves academic performance and recommend eyewear for all students. What it overlooks is that wearing glasses is associated with visual impairments, and those who have their vision corrected can engage better in learning activities. Moreover, access to healthcare that provides glasses may correlate with socioeconomic factors that also contribute to better educational resources.
Similarly, an AI might find that regions with higher internet usage have better overall literacy rates. It could infer that increasing internet access will directly improve literacy. While internet availability can be a valuable educational tool, the AI ignores that regions with higher literacy rates are more likely to adopt and utilize internet technology effectively. Socioeconomic development, educational infrastructure, and cultural factors play significant roles in both literacy and internet usage.
These examples highlight a fundamental flaw: AI systems that rely solely on statistical correlations can misinterpret the data when they lack contextual understanding. Without the ability to discern cause and effect, AI may produce recommendations that are not only incorrect but potentially detrimental.
One of the key reasons behind this limitation lies in the foundational architecture of most AI systems today, particularly LLMs like GPT-4, Claude, LLaMa and many others. These models are built upon statistical methods and deep learning architectures known as transformers (please refer to our guide to know more about this). Transformers excel at processing and generating sequences of data by predicting the next word in a sentence based on the probability distributions learned from vast amounts of text. This process is inherently statistical and focuses on capturing patterns and correlations within the data.
Transformers utilize attention mechanisms to weigh the relevance of different words in a context, enabling the model to generate coherent and contextually appropriate responses. However, this mechanism does not equip the model with an understanding of causality. The model learns that certain words or phrases are likely to follow others, but it doesn’t comprehend why they do. It’s similar to a child learning that clouds often precede rain but not understanding the atmospheric processes that cause precipitation.
LLMs are trained on extensive datasets that encompass a wide range of human knowledge, language usage, and writing styles. While this training enables them to produce text that mirrors human language, it also means they inherit the biases, inaccuracies, and limitations present in the data. Since they lack real-world experience and cannot perform experiments or interventions, they are unable to distinguish between coincidental correlations and genuine cause-and-effect relationships.
For example, if an LLM observes that articles about economic recessions often mention gold prices increasing, it might infer a direct causal relationship between the two. Without an understanding of the underlying economic factors—such as investors turning to gold as a safe-haven asset during economic uncertainty—the AI cannot accurately discern causation. It simply notes the frequency of words appearing together and assumes a connection based on statistical co-occurrence.
This statistical nature of LLMs also means they can generate plausible-sounding explanations that are, in fact, baseless. They might provide reasons or justifications that align with patterns in the data but do not reflect any true causal mechanisms. This limitation is critical when AI systems are used in domains requiring precise reasoning, such as medical diagnosis or legal analysis, where understanding the cause is essential for effective decision-making.
Moreover, the lack of causal understanding in LLMs can perpetuate and amplify existing biases. If the training data contains stereotypes or biased associations, the model may reproduce these in its outputs. For instance, associating certain professions with a particular gender or ethnicity based on historical data reflects correlations in the dataset, not causal truths about capabilities or preferences.
In essence, current AI models function as sophisticated pattern recognition systems. They excel at identifying and replicating patterns within their training data but do not possess an intrinsic understanding of the world. They cannot infer that flipping a light switch causes a lamp to turn on; they can only note that the words "flip the switch" often appear near "the light turned on" in text.
Addressing this challenge requires a fundamental shift in how AI models are designed and trained. Researchers are exploring ways to integrate causal reasoning into AI, combining statistical learning with causal inference frameworks. This involves teaching models to consider not just whether variables are associated but whether changes in one variable lead to changes in another. By incorporating methods from causal inference, such as causal graphs and do-calculus, AI systems can begin to model interventions and predict outcomes based on hypothetical scenarios.
The development of causal AI represents a significant step toward creating systems that can understand and reason about the world more like humans do. It involves endowing AI with the ability to ask "what if" questions, consider counterfactuals, and reason about the potential effects of actions. This advancement is crucial for applications where understanding causality is essential, enabling AI to move beyond pattern recognition to become truly intelligent systems capable of making informed and responsible decisions.
Real-World Consequences: some examples and the Role of Context and Domain Knowledge
In the healthcare sector, such misinterpretations can have serious repercussions. An AI analyzing patient data might find a correlation between the use of a particular medication and recovery from a disease. It might suggest prescribing this medication universally, without realizing that it was only effective in a specific subgroup of patients with certain genetic markers. Administering it broadly could lead to adverse effects in patients for whom the medication is unsuitable.
Similarly, in the criminal justice system, predictive algorithms are increasingly used to assess the likelihood of reoffending. An AI might notice that individuals from certain neighborhoods have higher rates of recidivism and recommend harsher sentencing for people from those areas. This ignores underlying socioeconomic factors such as poverty and limited access to education and employment opportunities, which contribute to higher crime rates. By failing to address the root causes, the AI perpetuates a cycle of disadvantage and discrimination.
In marketing, an AI could observe that customers who purchase running shoes also frequently buy health supplements. It might conclude that buying running shoes leads people to purchase supplements and recommend targeting supplement advertisements to shoe buyers. However, both purchases are likely influenced by a third factor: an interest in personal health and fitness. Without recognizing this shared motivation, marketing efforts may miss the mark.
Understanding the context in which data exists is crucial for accurate interpretation. AI systems often lack the domain knowledge that humans use to make sense of information. For example, an AI might find that regions with a high number of storks have higher birth rates and conclude that storks bring babies, reviving an old myth. In reality, rural areas might have more storks due to environmental factors and higher birth rates due to cultural or economic reasons.
An AI examining economic data might notice that as the number of new housing developments increases, so does the incidence of respiratory illnesses. It could infer that new housing causes health problems. However, the underlying cause might be that both are occurring in areas with high levels of air pollution due to industrial activity. The AI’s failure to identify the true cause could misinform public health initiatives.
In sports, an AI might observe that athletes who wear a particular brand of apparel perform better in competitions. Concluding that the apparel enhances performance, it might recommend all athletes switch to that brand. The AI overlooks that top-performing athletes are more likely to be sponsored by high-end brands, and their success is due to their training and skill, not their clothing.
Causal AI and its Applications: A Step Towards Understanding
To overcome these challenges, researchers are developing causal AI systems that aim to understand and model cause-and-effect relationships rather than just identifying correlations. Unlike traditional AI, which might observe that ice cream sales and instances of sunburn increase simultaneously and assume a direct relationship, causal AI seeks to identify that both are effects of a common cause: warmer weather.
Causal AI employs advanced statistical methods and algorithms to model the relationships between variables. By constructing causal graphs and incorporating domain expertise, these systems can differentiate between mere associations and genuine causal links. This allows for more accurate predictions and informed decision-making.
In finance, causal AI can help identify the true drivers of market trends. An AI might notice that stock prices rise when certain social media keywords become popular. Rather than assuming the keywords cause the market changes, causal AI can analyze whether both are influenced by underlying economic indicators, preventing misguided investment strategies.
In public health, causal AI can play a fundamental role. During a disease outbreak, understanding the factors that contribute to the spread is essential. An AI might notice that regions with higher hand sanitizer sales have lower infection rates. While this correlation is promising, causal AI would help determine whether increased hand hygiene directly reduces infections or if other factors, such as public awareness campaigns and access to healthcare facilities, also contribute significantly.
In education, causal AI can help identify effective teaching methods. Suppose an AI observes that students who participate in online forums tend to have higher test scores. It might recommend increasing online interactions. However, without understanding whether forum participation leads to better understanding or if more engaged students are simply more likely to use forums, the recommendation might not yield the intended results. Causal AI can help disentangle these relationships, guiding more effective educational interventions.
In agriculture, an AI might find that farms using a specific pesticide have higher crop yields. It could suggest widespread adoption of this pesticide. Causal AI would analyze whether the pesticide is the direct cause of increased yields or if those farms also employ other practices, such as advanced irrigation techniques or soil management, that contribute to their success.
Challenges in Implementing Causal AI
Despite its potential, implementing causal AI is not without difficulties. One significant challenge is the need for high-quality data that captures the necessary variables and their relationships. Incomplete or biased data can lead to incorrect causal inferences. Additionally, modeling complex systems with many interacting factors requires sophisticated algorithms and computational resources.
There is also the issue of confounding variables—factors that influence both the independent and dependent variables. Identifying and adjusting for these confounders is critical for accurate causal analysis. In medicine, for example, lifestyle factors like diet and exercise can confound the relationship between a treatment and health outcomes.
Another challenge lies in the validation of causal models. Unlike predictive models that can be tested against observed outcomes, causal models often require experimental or quasi-experimental data to confirm their accuracy. This can be particularly challenging in fields like social sciences, where controlled experiments are difficult to conduct.
The Human Element
While AI continues to advance, human expertise remains indispensable (humans at the forefront, as we use to say at Turtle’s AI). Experts provide the contextual understanding and ethical considerations that AI systems lack. They help identify which variables are relevant and ensure that the AI’s conclusions make sense within the broader domain knowledge.
In environmental science, an AI might correlate deforestation rates with increased instances of certain diseases and suggest that tree loss causes illness. Environmental experts understand that deforestation leads to habitat disruption, bringing humans into closer contact with disease vectors like mosquitoes or bats. This deeper understanding guides more effective interventions, such as habitat conservation and vector control.
In the culinary world, an AI might find that recipes using a particular spice are rated higher by consumers. It might recommend adding this spice to all dishes. Chefs know that balance and harmony of flavors are essential and that indiscriminately adding an ingredient can ruin a dish. Human expertise ensures that AI recommendations are applied appropriately.
Also, as AI systems become more integrated into decision-making processes, ethical considerations become increasingly important. AI recommendations based on flawed causal assumptions can exacerbate existing inequalities and biases. For example, if an AI suggests that certain demographic groups are less likely to succeed in specific careers based on historical data, it may reinforce stereotypes and limit opportunities for those groups.
In hiring practices, an AI might observe that candidates from particular universities tend to perform better and recommend focusing recruitment efforts there. This overlooks the potential of individuals from diverse backgrounds and perpetuates a lack of diversity in the workplace. Causal AI can help identify the actual factors that contribute to employee success, such as specific skills or experiences, promoting fairer hiring practices.
Ensuring that AI systems are developed and used responsibly requires a multidisciplinary approach that includes ethicists, sociologists, and legal experts. They can help establish guidelines and regulations that promote fairness, transparency, and accountability in AI applications.
Moving Forward
While in specific and "narrow" fields it is possible to insert causal knowledge maps, it is much more complex to do so with general systems, such as LLMs, which aim to converse with humans "on equal terms" and can therefore slip on the banana peels of causality, as previously observed.
OpenAI’s Project Strawberry, now available as o1-preview, is one of the most promising efforts to address this issue. It represents a shift from traditional pattern recognition models towards something that more closely resembles true reasoning. Rather than simply recognizing correlations between words or concepts, Strawberry aims to integrate a “thinking phase” into the model’s process. This phase allows the model to deliberate before generating a response, giving it the capacity to handle more complex, multi-step reasoning tasks.
While not stricly "causal", this model demonstrates a noticeable improvement in tasks that require advanced problem-solving, such as tackling complex mathematics or analyzing intricate scientific queries. One key feature of the o1-preview is its ability to reason through tasks in a way that mirrors human deliberation—taking time to consider the problem, break it down, and generate a solution that is not just based on statistical predictions but on logical steps.
For example, earlier models like GPT-4 might struggle with tasks that require multi-hop reasoning—problems where the solution involves answering one question to inform the next. By contrast, the o1-preview model can now manage such tasks with greater accuracy, thanks to its improved reasoning framework. This enhancement is critical for applications where understanding the causal links between events is necessary, such as in scientific research or complex decision-making.
Another standout feature of Strawberry’s development is its ability to manage long-term planning tasks. In fields like finance or scientific research, AI needs to handle problems that unfold over extended periods, requiring strategic foresight and causal reasoning. The o1-preview model shows early signs of being able to engage in such long-horizon tasks, evaluating multiple variables over time and making decisions that reflect a deeper understanding of how those variables interact.
This model also promises to reduce hallucinations—a well-documented problem in LLMs, where the system generates plausible but incorrect or nonsensical information. And since "false causality" situations, like those described earlier, are, in their own way, hallucinations, they too could be reduced, if not eliminated, with this deeper approach. By improving the reasoning mechanisms behind its decision-making, Strawberry seeks to minimize such errors, making its outputs more reliable, especially in fields where accurate information is crucial.