Correlation vs Causation

The Cognitive Whiteboard is back! To kick off this new series, Luke talks about Correlation vs Causation for physical properties, and whether the concept means Gangnam Style is the song of the century…



Hello, and welcome back to the Cognitive Whiteboard. My name is Luke, and today, we're filming the first of Series 3. We're going to talk about correlation and causation and really address some of the differences between them. I’m going to highlight the point with a correlation between the number of songs that appear in Rolling Stones' Greatest from the 20th century against the production of oil from the lower 48 states of America. 

What we see if we put those two data sets together is an apparent relationship, and you could argue that we might see that the number of songs that came through in the '60s predicted that oil boom that came a little bit later. Take another step forward, and we say that shale boom that's on right now might tell us that the beginning of the 2000s are going to appear prominently in the Rolling Stones' Top 500 of the 21st century. Now, what I think is quite honestly an obnoxious song, "Gangnam Style", would be right at the beginning of that. So, it was the top song in its year, is that going to be in that list? 

Obviously, this is not predictive at all. It's completely rubbish, but it's amazing how often we make correlations and assume causal relationships. A great example of that in the geosciences is actually porosity to permeability. Porosity is dominated by the pore volumes, permeability is dominated by the pore throats. You can look through this proof, do it for yourself, you'll determine that you can prove that porosity and permeability have a correlation but not a causal link between the two. In a depositional system like a shoreface system, the porosity is going to be really heavily affected by sorting, so in the upper regions of the system, you're going to have essentially consistent porosity. If you logged it through here, you wouldn't see anything different under neutron density. However, the grain size is going to radically change permeability, so within this system of apparently static looking porosity, you should see quite a significant relationship in permeability. 

Those kinds of differences of that correlation in those regions can really affect how you predict fluids are going to flow, so it's important we retest it. How could we do that? Well, we could think about the relationship between these two, except that there is a correlation, not a causal link between them, and try to find what is the causality that's creating that association. On this case, porosity and permeability are both quite strongly linked to their depositional position and their burial history, and both of them have similar kinds of relationships in terms of where and what direction they start to degrade. 

The thing is, though, and we use the illite transition zone here to highlight, that the variation doesn't remain the same for both of the properties. If we take a reservoir that has a little bit of calinite in it, a little deposition, and start burying it, once we get beyond the smectite zone, that smectite is going to turn to illite. And illite, as we all know, is terrible for our permeability because it blocks up our pore throats. So we're going to suddenly see a rapid divergence between the relationship of porosity and permeability, and it's going to happen because of the relationship with depth. It might not be important in your reservoir, I'm not saying it is. It depends entirely upon the shape of your structure, but what we want to make sure we're always doing as geoscientists is throwing a bit of scepticism on any of these correlations that we can't associate to a direct causal relationship and retesting it as we go through it.

It's one of the things that I think will remain for a very long time, a core requirement for a professional to help make these interpretations. I think it's good news because I don't think a machine is replacing us yet. When you look at the way that machine learning is going to work, it's essentially these kind of correlations on steroids. We're talking about many dimensions of analyses that we can start to do, but they can find correlations that aren't necessarily going to be predictive because they could develop a chart very much like this one. Now, if you have enough data, the theory would be that you eventually get beyond that, but the geoscience isn't necessarily in that space. So, this is one of the points that allows me to sleep comfortably at night and feel like there's still need for a geologist coming forward. 

Interested to hear what your thoughts are. This one's going to probably raise a little bit of a question, but happy to have that conversation as well. Let me know your comments below, and until next time, I'll see you back at the Cognitive Whiteboard.