Posts

Building Models With Missing Data

  Have you ever worked with a real-world problem where you have all the data that you need in a form that you could easily use to build models?   In the case of most problems, we find that data are missing, or there are errors in how the data are measured, or we’re faced with different types of data that need to be integrated. That’s been especially true in many clean technology fields - water, energy, climate, sustainability, ecosystem restoration and agriculture among them.   So, how do we deal with data with so many challenges?     One way is to see if there are alternative ways of measuring the data. One possibility is to identify surrogate datasets that can be calibrated and used as alternatives for the primary measurement. A second possibility is using cheaper, more widely distributed sensor data such as Purple Air sensors for air quality monitoring in combination with the primary data sources so that models can be developed. A third alternative is to use modeling techniques like

Coming this Sunday, September 20th: Bayesian networks in clean technology live, virtual workshop

  How do you find out why new technologies are being adopted? How do you find the early adopters and figure out why they are using these new technologies?   As startups and individuals build new tools and applications in agriculture, water, energy, sustainability, forestry and climate - some of the the biggest questions they face are understanding who is likely to adopt these technologies, the parameters governing these decisions and how they interact with each other.     So, how can this be measured and modeled quantitatively? Welcome to the wonderful world of Bayesian networks!     Bayesian networks are powerful machine learning algorithms that allow us to model how different aspects of a problem are interacting with each other, estimate how likely it is that someone will choose to do something like buy a new technology, account for the uncertainty inherent in problems in clean technology where we don’t know all the parameters and values associated with them - and solve a whole suite

Agriculture, Farms and Data

  This month, let’s talk about agriculture, crops and all things related to food!   If there’s one thing that a global pandemic has shown us - it’s how interconnected our supply chains are, especially in the food sector. For most people these days, getting groceries means going to a well-stocked market or food cart and getting fruits, vegetables and other standard supplies from there. We seldom go to the field or orchards or farms to get our food directly from the suppliers. And in general, the supply chains are so well oiled that we rarely run into issues about food not being available - as long as you’re able to pay for it! The pandemic revealed several aspects of our food system - where our favorite foods come from, how crops are grown, how animals are raised, who harvests and processes our food - and how these systems are so closely connected to each other that impacts on any part of the chain have an effect on the availability of food many miles away.     Pre-pandemic, there was a

Startups and the emerging market for data science in forestry

Image
  Today, we’ll wrap up our look at how data science, machine learning and AI are transforming the forestry sector by exploring the market and startups in the field.   Forest products like timber, pulp, herbs and others contribute at least half a trillion dollars to the global economy each year. Now, while the word “forest” typically conjures up an image of a place that’s remote, hard to access and undisturbed - the truth is that a lot of forest products come from agro-forests. These are forests that are planted, harvested and maintained similar to crop fields - and thus, have similar issues to those seen in the agricultural sector. However, while there’s been a lot of interest in the agricultural sector on using data science, machine learning and artificial intelligence to solve problems, the forestry sector has been slower to catch on. But that’s been changing in the last couple of years - with Scandinavian countries and Canada leading the way. And the major developments have been in

Changing forests, Changing climate and Changing economies

  One of the fascinating aspects of working with data in clean technology is how variable the data are over space and time. So, as scientists trying to understand how different systems interact with each other, it usually means that we’re building several models that work together so that both the spatial and temporal aspects are accounted for.     And that’s especially true in the forestry sector. Forests are incredibly important ecosystems - untouched forests in the Amazon, Indonesia, the Congo Basin and other areas sequester carbon, provide habitat for species that cannot be found elsewhere and have been found to be important controllers of weather patterns locally and regionally. Additionally, second growth forests and agro-forests supply timber, medicines and other products that contribute close to $583 billion dollars every year to the global economy.   Further, as countries around the globe work on combating climate change, REDD+ payments or payments to developing countries for

Communicating As A Data Scientist

Image
  Wow, this has been a crazy week here in the San Francisco Bay Area! If a pandemic wasn’t enough, we now have over 300 fires burning in the area as a result of an unusual summer thunderstorm accompanied by lightning strikes.     It’s one of the aspects of climate change - that weather becomes more extreme. So, the western US and Australia as well as other areas see less precipitation, or precipitation that is unusual in amounts and timing, warmer temperatures. Thus, drier, warmer conditions that are ideal for these kind of extreme events become more prevalent - and hence, more disasters.     As professionals working in clean technology, we often get tasked with building the models for these systems, understanding what’s happening on the ground and developing new technologies to help solve these problems.     The one thing that many of us don’t really explore is the whole aspect of communicating the science and what the data are telling us.   This aspect often gets relegated to science

When AI and Machine Learning come to the forests

  A big thank you to everyone who joined us last weekend for a lively and interesting discussion on data engineering and how to build prototypes that access satellite imagery using Google Earth Engine and Python.   It’s always fun to talk about satellites, imagery and how to get things to work in many different clean technology sectors - agriculture, water, energy, climate and disaster management among them.     Today, let’s talk about one sector that doesn’t get as much attention - forestry.   If you heard the the words forests and satellite imagery in one sentence, what comes to your mind? Deforestation? Reforestation? Wildfires? All three?   Managing our forests sustainably is key to protecting the environment in so many different ways - forests have a huge impact on climate, on ecosystem services and on the livelihoods of communities that rely on them. However, the challenge is that most forests are hard to access and data is often difficult to verify on the ground.     But that’s