Wildfires and the limitations of data science

 Let's get back to talking about wildfires! This is the second of our two post series on wildfires, data science and what we can do about them. In the first post, we talked about satellite data and how it is used to track wildfires - especially large ones like we see in the Amazon this year.

The wildfires burning in the Amazon have slipped off the front page of most newspapers, but they're still burning. And let's not forget the opposite end of the globe where wildfires in Indonesia are also burning out of control! The interesting aspect about wildfires and natural disasters in general is that most of the attention and resources are focused on dealing with them as they are happening and figuring out what resources are needed and what changes are needed after it's all over. 

So, we see a lot of effort focused on satellite imagery, understanding wildfire extents after they have started, building apps and websites for people to access resources and tools during and after the event. But we hear a lot less about the science, including machine learning, that goes into predicting wildfires and their impacts on humans and natural ecosystems. Of course, that's partly because it's so much more newsworthy to highlight the fire as it's happening - but also because following the prediction models requires an understanding of basic statistics, environmental science and more recently machine learning.

Today, let's take a look at some of the models used in predicting the extent and severity of wildfires around the world, how machine learning is helping improve the predictions and the limitations of using data science.

How are wildfires modeled? The most common types of models are dynamic physically-based models and statistical/machine learning models. Physical models involve looking at what causes a fire to start (ignition conditions), the weather at the time or predicted weather (wind, rainfall, moisture in the air), the type of vegetation (scrub, brush, tree), the condition of the soil and vegetation (dry, moist), topography (slope, rate of slope) and human activity (fire setting, electrical equipment, roads) as some of the main factors. Statistical models typically are used once the fire starts to predict the extent and severity of the fire once it gets going. 

While physical models are often more accurate, they require a knowledge of parameters that may not always be current and are often slower to complete. Statistical models are faster, but they aren't always capable of answering questions about fire conditions and severity before the fires start and are identified.

That's where machine learning is coming it. Scientists and researchers are using machine learning to figure out which parameters may be most relevant and training models to predict the probability of different outcomes. For example, researchers in Portugal used random forests to estimate the area that would be burned, given the fire had started and the weather conditions at that time. Other researchers used logistic regression to predict if a fire would exceed a specified threshold in the United States, given the ignition conditions and data from ground weather stations. In Brazil, researchers used classification algorithms to predict the risk of fires starting in different areas - but were not able to predict which ones would likely become large. Most recently, researchers from UC Irvine built a decision tree framework to estimate which fires would become large, based on their conditions at ignition.

All this research is extremely relevant since fire departments, cities and governments around the world need to plan their resources and estimate which areas will require equipment as well as whether it's possible to get the equipment and manpower to the fires before they become unmanageable.

The challenge with using all these models is that sometimes conditions are not always predictable - an arsonist starting a fire for example - and model results, while better than nothing are still not as accurate as agencies and groups would like. Just as an example, the research from UC Irvine is groundbreaking because there's very little work on identifying which fires are likely to become large and require significant resources. And that's something that fire departments would dearly like to have. However, the model accuracy is still only a little better than 50%, so not very different from flipping a coin. 

What's really promising about all these approaches is that they can now identify the parameters that are most sensitive - or in layman's terms - what makes the fires large and pose a threat to humans and ecosystems? This means that model accuracy can be improved significantly in the future as new approaches are tested. And, in the meantime, systems can be developed to try and suppress the fires so that we are primarily dealing with small and medium scale fires, not the large ones we see today.

What our community are reading

Moonshots, Models, IoT and Machine Learning in Agriculture

Our online community space is now live!

How much water should an email consume? Data centers and water use