Building Models With Missing Data
Have you ever worked with a real-world problem where you have all the data that you need in a form that you could easily use to build models? In the case of most problems, we find that data are missing, or there are errors in how the data are measured, or we’re faced with different types of data that need to be integrated. That’s been especially true in many clean technology fields - water, energy, climate, sustainability, ecosystem restoration and agriculture among them. So, how do we deal with data with so many challenges? One way is to see if there are alternative ways of measuring the data. One possibility is to identify surrogate datasets that can be calibrated and used as alternatives for the primary measurement. A second possibility is using cheaper, more widely distributed sensor data such as Purple Air sensors for air quality monitoring in combination with the primary data sources so that models can be developed. A third alternative is to use modeling techniques like