Machine Learning, AI and Climate
As the impacts of climate change on the planet become clearer, scientists and professionals in climate science are looking at the latest tools and technologies in AI and machine learning to help understand and mitigate the effects. At the same time, career opportunities in the field are growing and we’re seeing increasing numbers of students and early career professionals interested in developing and using their skills in ways that can help the planet.
So, when and where can machine learning and AI be used in climate science? And what are the pitfalls?
If you’re working in environmental and earth sciences, you probably already have a pretty big toolbox that has been developed over several decades! It consists of standard statistical techniques including spatial and temporal statistics, a range of physics-based or process based models, and several data collection and data integration technologies at different scales.
What can machine learning add to this? Does it replace all the other technologies? Does in augment the others? What circumstances can it be used in and what are its limitations?
Before we talk about where machine learning is useful, let’s talk about the process of understanding when and why it can be deployed. After all, we wouldn’t use a hammer to screw in a nail!
First, for machine learning to be effective at any problem, data are needed. And not just any type of data - unbiased, representative data that can be used to predict the future based on past history. What climate change is doing is throwing out our assumptions that the future will look like the past.
What does that mean in practice? It means that for many problems in climate or other natural systems (air, water forestry, agriculture etc), we cannot simply bolt existing machine learning models from Python onto our datasets. In fact, studies have shown that when we do that, the model predictions that we get are similar to historical training data values. In other words, our machine learning models are telling us that our weather/crop yield/water availability are not going to change - something that is certainly false.
This result is because most of our data are from historical patterns and the impacts of climate change on many Earth systems have only started becoming clearly evident in the last 5 years or so, i.e the dataset is biased for what we’re trying to predict. When a standard machine learning model is used with such a dataset, the historical patterns drown out the small signal from climate change and we get results that are clearly incorrect.
In order to make machine learning models work in such cases, we have to either broaden our dataset or transform the dataset to make it climate invariant. If we’re broadening our datasets, we need to incorporate climate model simulation results, which come with their own set of uncertainties, biases and errors. If we’re transforming our data to make them climate invariant - we’re going to have to understand the processes governing these systems in order to develop the transformation algorithm.
The second challenge is that off the shelf machine learning models do not inherently conserve mass and energy - something that is necessary in climate change models. Since machine learning models are purely based on data, basic scientific laws such as conservation of energy and mass are not usually built in. Sometimes, if the data and the created algorithm work together well, the final predictions do result in basic laws being respected - but that’s not always the case. If we want to be able to understand what’s happening on the planet, predict changes and develop adaptations - the models have to be scientifically accurate. So, as scientists and engineers working in the field, one of the adaptations that we need to build into any machine learning model is a check that the basic scientific laws are being obeyed in the results that the model generates. This can be done either at the end of a model run where the results from the ML model are evaluated or during the model runs through a constraint where the result produced is checked for mass and energy conservation and discarded if it fails to meet that test.
The third challenge is one that’s related to incorporating all the new, high resolution data that are being generated with satellites for example with existing sparse, heterogeneous data. It’s a data assimilation problem and one that’s fairly common in many fields beyond computer science where machine learning models are being used. Again, this is where machine learning models can’t just be used as is - the key is to be able to create the equations to transform and integrate different types of data, figure out errors that propagate with data of different scale, define new parameters as needed for the models - and so on. It’s a combination of being able to understand the science behind different earth systems (climate, water, air, plants, soil) and build statistically valid algorithms that can incorporate all the different factors involved.
The fourth and final challenge is one that is closely related to the applications of climate science and earth science in society. In most cases, climate science, earth science and environmental science are used in making decisions that affect how people live on the planet - the food we eat, the water we drink, the air we breathe, the houses and cities we live in. That means that there are usually several stakeholders with different levels of technical and scientific skills who are involved - government officials, scientists, community members and so on who need to make decisions. When making these decisions, the model results need to be understandable and explainable - something, that is harder to do with machine learning models. Machine learning models are typically black boxes - and discussing the results without knowing what is happening inside that black box usually results in extremely poor outcomes! While scientists are working on developing methods to explain and understand what’s happening in these black boxes - it does mean that we can’t use ML models as is - we need to change and adapt them before using them in climate science or earth science work.
While there are other challenges in using machine learning in climate and environmental science, these are the four that show up in almost every problem that we’re trying to solve.
Next time, we’ll talk about how useful machine learning and AI can be in climate science - once we’ve figured out how to deal with the challenges they pose!