Tackling Climate Change with Machine Learning
In the first of our two-part conversation on machine learning in climate science, we talked about the main challenges in using machine learning in earth and environmental science. Today, let’s talk about why we go through the effort of using these tools in clean technology when they require a significant investment in understanding and modifying them.
Why use machine learning?
Three main reasons - 1) Do it better 2) Do it faster 3) Find unexplained trends or patterns.
1) Do it better: Last time we talked about the challenges of using machine learning in solving problems in clean technology. And that’s still true for unmodified, off-the shelf models. However, there’s a huge opportunity for scientists and engineers who are interested in understanding and adapting these models to make them work effectively with all the other tools in the tool box!
Let’s look at one such adaptation where machine learning algorithms can be used in concert with physics based models to generate more accurate results. As an example, let’s look at urban systems in the aftermath of a hurricane or flood where we’re trying to predict areas that are most vulnerable under climate change scenarios. If we examine data from the last 10 years, there have been several years where we’ve seen unprecedented storms - say two 100-year storms in the last 5 years. That means, for a small area over a short time, there’s sufficient data from satellite images and real-time, high resolution sensors that can be used in machine learning models to determine patterns and find areas that are continuously being subject to flooding. At the same time, we can combine these with larger physics based models that incorporate changes in water and air flow under climate change scenarios, and simulate the impacts on larger areas (not just one city but a whole watershed with several urban centers and rural communities) over the next 20 years. 
Combining models in this way would help us improve how accurate the predictions are at smaller scales and create pathways to simulate a larger, longer-term future. 
2) Do it faster: A fascinating area that climate scientists have been working on is using machine learning models as replacements for parts of physics based models in order to get results faster and be able to perform more simulations over larger time periods.
In several large-scale water models, forestry and agriculture models, climate models or combined earth system models, there are sub-models where the parameters are well understood and the results from these sub-models can be fed into the next sub-model. As an example, think about rainfall patterns being fed into a plant model to predict crop yield. The physics of rainfall patterns are well understood and can be hard-coded as a constraint into a machine learning learning model. So, we can replace the rainfall model that is physics-based with the adapted machine learning model (e.g regression model with a constraint) that needs a smaller rainfall gauge dataset to generate the results.
The time required for the machine learning model to run is usually far lower than the time that a physics-based model needs to crunch through the simulations. And, if the results from the machine learning model are shown to be as accurate as the results from the physics-based sub-model, we can replace the numerical physics based model with the machine learning model.
3) Find unexplained trends or patterns: Given the speed and magnitude of the changes that we are seeing all over the planet today, it can get overwhelming to track down the trends and compare them to historical data. Moreover, we’ve seen several interesting relationships between parameters and features in the past decade that we weren’t expecting so early or had predicted with a lower degree of certainty.
One of the exciting aspects of using machine learning models, especially the large deep learning models, is the ability to find unexpected or unexplained patterns in the data. An example of this is using deep learning to explore the implications of the loss of Arctic sea ice on atmospheric currents, the build up of high-pressure ridges and increases in air pollution in Asia.
Scientists often build a machine learning model (e.g a neural network) and run it using a large dataset - for example, air pollution levels in multiple cities around the world, ocean temperatures, atmospheric temperatures, and satellite images of sea ice in the Arctic over the past 30 years. The results and correlations that are observed are then explored to determine if 1) it’s physically feasible - i.e. it’s an actual correlation and not just an artifact of the machine learning algorithm not being modified to incorporate the correct physical constraints 2) it’s a result that is valid over multiple datasets and is not just an artifact of the data that were being used to test and train the model and 3) is it something that has been observed before and the pattern is now more visible as we’ve got additional data or is it something completely new?
Answering any of these questions leads us down a fascinating path of understanding our earth system better as well as how best to adapt machine learning to make it effective in this field.
 
