Helping Clean Technology Professionals and Data Scientists Work Together in a Remote World
As the Covid-19 pandemic heats up, many of us are now sitting in our homes because of quarantine and enforced social distance and isolation curfews from local governments. If you’re like me, you’re probably talking to colleagues through Zoom, trying to make meetings work online, missing in-person interactions and doing our best to get work done under really challenging circumstances.
While it’s hard enough managing teams and people with different skills and backgrounds in general, doing so remotely makes it even more challenging! So today, I thought I’d talk about working with teams and professionals from two very different fields - clean technology and data science - and discuss what makes the working relationships between them effective and smooth.
The joy and the challenge of working at the intersection of very different fields is that often professionals from each field clash - world views seem so different that translators are needed to bridge the gap! That makes it difficult for people who are managing the teams and trying to get the best out of everyone as well as for the professionals in these fields themselves who are doing their best to solve these tough problems.
As software engineering and data science penetrate deeper into a wide range of sectors, conversations and arguments erupt around goals, methods and how problems in the sector are actually perceived and solved.
And that’s especially true in various clean tech sectors - water, energy, agriculture, climate change, forestry and so on. These are sectors where professionals have been working successfully for a long time and have a sense of what people and the market are interested in. They’re interested in the new tools and ideas that data science brings and often have some experience working with statistics, machine learning, remote sensing and other data science tools in their sectors - but may not buy into all the AI/ML/data science hype!
So how can we all work together effectively?
When you have clashes between people and teams working in clean tech and data science, they typically boil down to 3 issues - attitudes, data and models.
Let’s start by talking about attitudes. Attitudes are usually the first sticking point - because it’s all about what people think is important. And that’s something that’s so fundamental to the field that you’ve been trained in. It’s a combination of experience in a sector, a formative first job, new skills you’ve learnt - all the sometimes intangible things that come under the umbrella of “professional judgement”. The tricky part is when professional judgements in different fields collide.
Software engineers, data scientists and others who come from the tech industry are usually quick to focus on speed and execution. “How quickly can it be built or prototyped?”, “Can you leave the results as a beta test?”, “How can I get more data?” “Who needs to be involved to get this done?” are the kind of questions that are most often asked.
People from the clean tech field (agriculture, water, disaster management, energy etc) are often the ones who will focus on implementation effectiveness, risk, public outreach and safety. They will ask “How reliable are the results?”, “Where have model predictions been optimistic?”, “If there’s an error, what’s the real-life cost?”, “If you have to present it to policy makers, how successful will this be? “Can it be understood easily and does your visualization capture the nuances?” “What are the tradeoffs?”
Where these worldviews can clash is when software engineers and data scientists perceive the clean tech experts as being unnecessarily cautious and slow and the clean tech experts see the gaps in where the models and tools and do not think they’re ready for rollout in their current stage.
How can these differences be resolved? In an ideal world, we’d all sit down together and calmly and politely talk through our different view points - but we don’t live in an ideal world, do we?
The easiest way to minimize differences and clashes in the team is to ensure that the goals of the project or product are very clearly spelt out. Just to start with -
1) Who is your product serving and what are their needs? For example, you’re working in agriculture and your new product idea is to create a tool that will allow farmers growing perennial crops like fruit trees/nuts and so on to predict yields. The first question that your clean tech expert will ask is - how much of an improvement would the farmers like to see over what they have? A month? Three months? And how easy to use is your product - is it just another pretty picture, or does it actually give the farmer some valuable insight that can be used to reduce costs or improve revenue?
2) What are the legal requirements and regulations in this particular sector? Let’s say you’re building a disaster management tool that helps cities and organizations plan their response to disasters like the pandemic facing us today. One of the foremost issues here is understanding what cities and governments need to know and how much error is possible in the models that you are building. What regulations need to be met exactly, because if they aren’t, it could result in significant loss of life as well as potential lawsuits? And, if you are creating simulations with a lot of uncertainty, how can you communicate that to decision makers so that people understand different scenarios. And once again, is your product easy enough to use that people with minimal training can still use it effectively?
3) What type of product is needed? An app, a website, a report, a sensor or some combination of all of them? The answer to this question really depends on the problem being solved, what’s currently in the market and what can actually be built. And this is where data scientists, software engineers and clean tech experts need to have clear, forthright conversations as they bring different perspectives to the table.
And the list goes on..
The second issue that crops up is about data. Now, data in clean tech is very different in many aspects from data that’s obtained from websites like Google or Facebook or from apps like Uber, Yelp.
Big data is usually defined as data that is high in volume, variety and velocity. That means lots of data, different types of data (think clicks, location information, text, images) and data that comes rapidly (think of the speed at which servers need to be processing data from a Google search for example).
Data in the clean tech field can be both big data and small data. Let’s look at the type of problems that are clearly big data. Examples include scraping a website, search, recommendation systems, and social graphs. All these are the result of people producing lots of data by clicking on buttons and interacting with systems. Other types of big data are from inanimate systems - sensors in cities and streams producing millions of data points, satellites sending back images, drones and aircraft producing streams of images to be analyzed and water treatment plant systems with sensors to monitor the status of every system.
Next, let’s look at the questions that need to be answered - estimating yields for a crop (which is planted once or twice a year at most), predicting flooding in systems (occurs every few years, 100 year events or seasonally), managing and mitigating wildfires (typically happen during the wildfire season once or twice a year), planning cities (which occurs over a time period of several years), managing energy and water use (which could range from monthly for homes to daily for offices), and farm management systems (which are seasonal in nature). These data are usually small data - i.e. they are less frequent (annual or monthly) with fewer occurrences.
This is where the second set of challenges arise between clean technology professionals and data scientists. Data scientists are used to working with large volumes of data - but in clean technology fields, we have a mix of big data and small data that can be used to solve problems. And both types of data are essential - it’s very difficult to build a yield model without actually testing its accuracy in the field using annual or biannual yields for example! Additionally, data vary over space and time and models in one location are not a good fit for another location. Think wildfires in California, Australia and the Amazon. Or corn yields in India and the US.
And that’s where clean tech experts and data scientists need to sit down together and figure out how to solve the problems. This includes asking questions like what’s primary data - i.e. directly related to the problem, what’s secondary data - i.e will help solve the problem, but not a direct parameter, spatial and temporal distributions of data, anomalies in datasets, missing data, and so on. Combining data from different fields is usually another question to discuss because data in one field can become a gold mine in acting as a surrogate to solve a problem. One amazing example is using data from Facebook and Twitter to monitor and improve responses to natural disasters, in addition to using existing sensor networks and crisis management tools.
That brings us to the third issue - how can all the data be combined and effective models built? These are difficult questions that are still being researched by scientists working all over the globe. Machine learning models, statistical techniques and other types of models have to be adapted to work for data and problems in the clean technology sector - simply because of the issues with data that we discussed above.
So once again, you need a combination of clean technology specialists, software engineers, machine learning engineers and statisticians to sit down together and figure out how best to build the model. It’s unlikely to be a purely machine learning algorithm but most probably going to be a combination of machine learning, statistics and simulation tools - all put together in different degrees and different scales.
Building scalable, useful models could range from combining different types of inputs (from all kinds of systems at different frequencies - e.g. satellite data and sensor data in smart cities), building simulation models (that model the actual system being studied - e.g. a water treatment plant), creating machine learning and/or statistical algorithms (deep , building data processing/storage systems and data visualization systems.
So, the challenge for everyone and every team working in these fields is to figure out how all aspects can be built together and the different skillsets team members bring to the table.
There are no unicorns - just good teams!