Have you ever worked with a real-world problem where you have all the data that you need in a form that you could easily use to build models?
In the case of most problems, we find that data are missing, or there are errors in how the data are measured, or we’re faced with different types of data that need to be integrated. That’s been especially true in many clean technology fields - water, energy, climate, sustainability, ecosystem restoration and agriculture among them.
So, how do we deal with data with so many challenges?
One way is to see if there are alternative ways of measuring the data. One possibility is to identify surrogate datasets that can be calibrated and used as alternatives for the primary measurement. A second possibility is using cheaper, more widely distributed sensor data such as Purple Air sensors for air quality monitoring in combination with the primary data sources so that models can be developed. A third alternative is to use modeling techniques like Bayesian networks that can accommodate missing data points by incorporating them into estimates of how much the missing data contributes to uncertainty in the model predictions.
The first method was used byscientists at the University of Illinois, Urbana-Champaignin order to estimate how much corn and soybean were planted in an area in Illinois. Normally, it takes 4-6 months after the crops are harvested for the US Department of Agriculture to provide estimates of the number of acres that were planted by corn and soybean. This means that decisions about policies on conservation, agricultural aid and so on are made using state estimates that have greater uncertainty in their values. Similarly with pricing and managing agricultural futures in the markets. So any method that can provide quicker, more accurate estimates is extremely valuable from an economic and policy standpoint.
However, the challenge in this is that it’s difficult to distinguish between corn and soybean using standard remote sensing data. Remote sensing data or data collected by satellites from space, is collected from a range of wavelengths. The wavelengths that are usually used in estimating crops and crop acreage belong to the visible spectrum - the RGB wavelengths. In addition to the difficulty in figuring out which crops are corn and soybean with these data, there are often locations and times when data cannot be collected because of clouds or other issues with the satellite sensors - leading to missing data points.
In order to solve this problem, the researchers discovered that there’s a secondary wavelength that can be even more effective in distinguishing corn and soybean at very early stages in the crop growth. By measuring the short-wave infrared wavelength (SWIR), a clear difference between the corn and soybean plants can be found - because the SWIR wavelength measures the water content in plant leaves, which is very different in corn and soybean plants when they start growing. By building a deep learning neural network to analyze 15 years satellite SWIR data at a 30m resolution, the scientists were able to identify corn and soybean acreage with 95% accuracy by the end of July for each field - just about 2-3 months after planting and well before harvest.
As you see, this is a significant improvement from traditional methods and will aid policy makers, farmers and traders in making decisions and optimizing allocation of resources - which in turn results in economic benefits.
This kind of combination of data sources, machine learning and economic analyses are what make data science in agriculture such an exciting field to be in as far as technical advancement, economic benefits and job creation!
What do Google, Climate Corporation, early stage startups in farm robotics, and researchers trying to figure out how to feed the world sustainably have in common? They’re all grappling with one of the toughest challenges of working with natural systems - how do you work with data that is sparse, unevenly distributed and with systems that have so many connections and interactions with other systems? Before the advent of cheap sensors that are connected to phones, easily accessible satellite data and drones that can fly over fields quickly and inexpensively - scientists in companies and academia worked on developing plant and crop models that incorporated as many aspects of the farm and as much data as was available so that they could understand and predict what was likely to happen on the field. Understandably, the forecasts took some time to produce and as the models grew more complex, issues about how to estimate model parameters and the uncertainty associated with the resul
A mid-sized data center consumes around 300,000 gallons of water a day, or about as much as 1,000 U.S. households; About 20% of data centers in the United States already rely on watersheds that are under moderate to high stress from drought and other factors; Operating a data center often requires a tradeoff between water use and energy use; And in a survey of 122 data centers in the United States, only 16% or 20 utilities reported plans for managing water-related risks. As professionals working in the field, what can we do to solve this issue? One aspect is developing and using water models that can identify water risks at different scales - so that we can predict the risk to water supplies under a changing climate. A second is using machine learning to identify and optimize water use between all the stakeholders in the watershed - data centers, farmers, cities, other industries - so that biases and needs are brought out into the open and the key issues identified. A third, of cours
Our online community space is now open to anyone who has signed up for a free or paid course on our website! In addition to everyone who signed up for our cohort-based courses, we're now expanding it to all the members of our community. If you've already signed up for any of our courses, check your email for the invitation for the space. It's where we'll get together to talk about all things data science and clean technology related, discuss the latest research, network and make connections with other professionals in the sector. It's an invitation only , no bots and no trolls allowed space - so come on over! Here's where you can check out our courses and join our community !