Posts

Building Models With Missing Data

  Have you ever worked with a real-world problem where you have all the data that you need in a form that you could easily use to build models?   In the case of most problems, we find that data are missing, or there are errors in how the data are measured, or we’re faced with different types of data that need to be integrated. That’s been especially true in many clean technology fields - water, energy, climate, sustainability, ecosystem restoration and agriculture among them.   So, how do we deal with data with so many challenges?     One way is to see if there are alternative ways of measuring the data. One possibility is to identify surrogate datasets that can be calibrated and used as alternatives for the primary measurement. A second possibility is using cheaper, more widely distributed sensor data such as Purple Air sensors for air quality monitoring in combination with the primary data sources so that models can be developed. A third alternative is to use modeling techniques like

Coming this Sunday, September 20th: Bayesian networks in clean technology live, virtual workshop

  How do you find out why new technologies are being adopted? How do you find the early adopters and figure out why they are using these new technologies?   As startups and individuals build new tools and applications in agriculture, water, energy, sustainability, forestry and climate - some of the the biggest questions they face are understanding who is likely to adopt these technologies, the parameters governing these decisions and how they interact with each other.     So, how can this be measured and modeled quantitatively? Welcome to the wonderful world of Bayesian networks!     Bayesian networks are powerful machine learning algorithms that allow us to model how different aspects of a problem are interacting with each other, estimate how likely it is that someone will choose to do something like buy a new technology, account for the uncertainty inherent in problems in clean technology where we don’t know all the parameters and values associated with them - and solve a whole suite

Agriculture, Farms and Data

  This month, let’s talk about agriculture, crops and all things related to food!   If there’s one thing that a global pandemic has shown us - it’s how interconnected our supply chains are, especially in the food sector. For most people these days, getting groceries means going to a well-stocked market or food cart and getting fruits, vegetables and other standard supplies from there. We seldom go to the field or orchards or farms to get our food directly from the suppliers. And in general, the supply chains are so well oiled that we rarely run into issues about food not being available - as long as you’re able to pay for it! The pandemic revealed several aspects of our food system - where our favorite foods come from, how crops are grown, how animals are raised, who harvests and processes our food - and how these systems are so closely connected to each other that impacts on any part of the chain have an effect on the availability of food many miles away.     Pre-pandemic, there was a

Startups and the emerging market for data science in forestry

Image
  Today, we’ll wrap up our look at how data science, machine learning and AI are transforming the forestry sector by exploring the market and startups in the field.   Forest products like timber, pulp, herbs and others contribute at least half a trillion dollars to the global economy each year. Now, while the word “forest” typically conjures up an image of a place that’s remote, hard to access and undisturbed - the truth is that a lot of forest products come from agro-forests. These are forests that are planted, harvested and maintained similar to crop fields - and thus, have similar issues to those seen in the agricultural sector. However, while there’s been a lot of interest in the agricultural sector on using data science, machine learning and artificial intelligence to solve problems, the forestry sector has been slower to catch on. But that’s been changing in the last couple of years - with Scandinavian countries and Canada leading the way. And the major developments have been in

Changing forests, Changing climate and Changing economies

  One of the fascinating aspects of working with data in clean technology is how variable the data are over space and time. So, as scientists trying to understand how different systems interact with each other, it usually means that we’re building several models that work together so that both the spatial and temporal aspects are accounted for.     And that’s especially true in the forestry sector. Forests are incredibly important ecosystems - untouched forests in the Amazon, Indonesia, the Congo Basin and other areas sequester carbon, provide habitat for species that cannot be found elsewhere and have been found to be important controllers of weather patterns locally and regionally. Additionally, second growth forests and agro-forests supply timber, medicines and other products that contribute close to $583 billion dollars every year to the global economy.   Further, as countries around the globe work on combating climate change, REDD+ payments or payments to developing countries for

Communicating As A Data Scientist

Image
  Wow, this has been a crazy week here in the San Francisco Bay Area! If a pandemic wasn’t enough, we now have over 300 fires burning in the area as a result of an unusual summer thunderstorm accompanied by lightning strikes.     It’s one of the aspects of climate change - that weather becomes more extreme. So, the western US and Australia as well as other areas see less precipitation, or precipitation that is unusual in amounts and timing, warmer temperatures. Thus, drier, warmer conditions that are ideal for these kind of extreme events become more prevalent - and hence, more disasters.     As professionals working in clean technology, we often get tasked with building the models for these systems, understanding what’s happening on the ground and developing new technologies to help solve these problems.     The one thing that many of us don’t really explore is the whole aspect of communicating the science and what the data are telling us.   This aspect often gets relegated to science

When AI and Machine Learning come to the forests

  A big thank you to everyone who joined us last weekend for a lively and interesting discussion on data engineering and how to build prototypes that access satellite imagery using Google Earth Engine and Python.   It’s always fun to talk about satellites, imagery and how to get things to work in many different clean technology sectors - agriculture, water, energy, climate and disaster management among them.     Today, let’s talk about one sector that doesn’t get as much attention - forestry.   If you heard the the words forests and satellite imagery in one sentence, what comes to your mind? Deforestation? Reforestation? Wildfires? All three?   Managing our forests sustainably is key to protecting the environment in so many different ways - forests have a huge impact on climate, on ecosystem services and on the livelihoods of communities that rely on them. However, the challenge is that most forests are hard to access and data is often difficult to verify on the ground.     But that’s

When Satellite Data Improves - What Happens in Clean Technology?

  In June this year,   we had a lively discussion and online workshop on remote sensing data   and how monitoring processes occurring on the Earth was why the Landsat satellite program was launched in the 1970s - a program that’s still running today.     But here’s an interesting question that came up in our conversation - since water, agriculture, energy and other clean tech sectors have been using remote sensing data for such a long time - what is so different now?     To answer that question, let’s first talk about how satellite data is used in clean technology. The sectors where satellite data, and data science in general, are widely used both commercially and in research and development are agriculture, energy, water, climate and disaster management.   So, what are the different uses of satellite data in each of these sectors?   Let’s take agriculture first.   Researchers and scientists have been using satellite data since the 1970s in the agricultural sector. The first product fr

A Trillion Dollar Market - But Where Are the People?

  Last week we talked about how the market in clean technology and data science   is already in the multi-billion dollar range and is headed to the multi-trillion dollar space in the next decade or so. However, one of the challenges that analysts highlighted was the lack of professionals who have sufficient expertise in both clean technology and data science. So today, let’s take a look at what’s happening in educating professionals in this exciting, new field as well as the kind of skills that are needed.   Most of the traditional college and university programs haven’t yet caught up with the demand for professionals at this intersection of specialities - although they are getting there! While many universities and colleges have created data science degrees - these usually focus on the problems that are faced by the high-tech and internet sectors. The graduates from these programs usually have a pretty solid understanding of coding, algorithms including machine learning, and statistic

Data Science and Clean Technology: Updated Market Analysis

Image
  If you were to ask people why they’re interested in applying data science in clean technology - the chances are that you’ll come across three answers. 1) They want to make a difference to the planet and help people 2) They think it’s cool technology and want to be at the forefront of innovation and 3) They’ve heard it’s a hot and upcoming field with lots of jobs and opportunities and want to get in at the ground level.   Now, one and two are both pretty obvious - but what about the third reason? Is the intersection of clean technology and data science really such a growing field?     To answer that question, let’s take a look at some numbers. Now, about five years ago, when the field was in its infancy, there was a lot of speculation about the field being anywhere from a   multi-billion dollar market to a multi-trillion dollar market . How did those estimates hold up, now as we look at what the next five years may bring?   As it turns out, the estimates held up pretty well.   Clean t