Need training for Big Data? Browse courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
This is a follow-up to my series of recent posts on ‘How to Become a Data Scientist’. Unofficially I’m calling it Part 2a, because it has become apparent that the second instalment (‘Learning’) did not encompass sufficient detail on how to improve the absolute essential skill of communication.
It was therefore necessary to return to my network of experts, to dig a little deeper. And based on the information that came back, I thought it was best to spin the answers into a separate blog, as a way of emphasising just how crucial effective communication is to data science.
So to kick us off, we will head back to Sean McClure, who you should remember from HtBaDS (catchy acronym right?):
“Learning to communicate is less obvious than learning an algorithm, yet vastly more important. This is because regardless of how good a model is, if it can’t be consumed and understood by the end user, it is worthless”
And he elaborated:
“There are two major pieces to data science: how analytics are created and how analytics are consumed. The former gets almost all the attention, while the latter is vastly more important to doing what matters – building products that solve problems. Things like Kaggle competitions are great for honing skills in building models, but it doesn’t teach people how to connect those models to the granular decisions real people make on a daily basis. A super predictive model is 100% useless if it doesn’t support the user requirements of the business product being built. Many people on Kaggle gravitate to the same decision trees and their ensembles because it leads to highly accurate models that predict well. But your job as a data scientist is not to predict the best you can, it is to find ways to bring machine learning into real world products that must answer to an array of constraints, trade-offs and business demands. In other words, a less predictive model is worth much more to a company if that model is better aligned to product requirements and business decisions, than some highly accurate model deemed valid by statistics alone”
Sean went on to outline the following aspects, which can be learnt to better communicate the benefits of data science, and “win” the competition that matters, i.e. building great products:
- Learn to keep machine learning accountable to product
- Learn to think about algorithms and the models they build conceptually
- Learn to connect real business costs to predictions
- Learn to validate models with people, not just statistics
- Learn to put imperfect models into products early to allow the real world to validate its worth
It is worth pointing out: while Sean is referring to building products, I would argue that the themes are also applicable to Type A data science, where building software is not necessarily required. But Sean focuses on product because in his line of work, software is how data science gets communicated, and on this he added:
"We can discuss things at a high level all we want but when people get to touch and feel the thing you’re talking about that’s when real communication happens. Remember that communication isn’t one way. We need to listen to what others are saying to pivot our data science towards what the end user really needs. Products anchor our conversations with something tangible, and allow those who use our data science to inform us of what is and is not working. Building data products not only teaches us a lot about how to do data science, but also forces us to communicate well, and ensure that people are the most important part in our analysis"
This is what I take from all this: communication is not just about describing something so that it is understood by a lay audience; it is more than that. It is about understanding the whole environment: the business pressures, the costs, the different perspectives – all of it. And if you can grasp that, and you are willing to compromise for the end goal, then effective communication should come as a result.
I can imagine what some of you are thinking though: “how do I actually go about developing this ability?” So to help us answer this question, we will return to the other Sean featured in HtBaDS: Dr Farrell, who explained:
“Developing expertise in presenting and communicating is a bit like building your skills in problem solving: the only real solution is experience, experience, experience. People from academic backgrounds will have many opportunities to do this throughout their studies/work, so for these people, I would advise them to grasp every opportunity to give a talk that comes their way (in particular to different audiences, such as outreach talks to the general public). For people from other backgrounds, I would try and land a talk at a meetup (or something similar) on a data science related project they are working on, for example a Kaggle competition”
And if you remember Boris Savkovic, he found a combination of approaches worked:
- Take any chance you get to present to as many audiences as you can, both internally and externally. Seek feedback and encourage people to give you honest, even if harsh, responses. Reflect and improve. Painful, but you have to do it
- Build bridges with people who are not from data science and communicate ideas to them. How can you get the message across? I found myself trying lots and lots of times until I got the right mix and lingo with different people
- Try and find a good mentor who can show you how to speak when bridging the gap between maths and the real world. In my case, this was a number of mentors from academia and industry and from different fields (energy, medical, research, etc.). Medical doctors are awesome at communicating and I had the luck to work with many of them in my previous role. You observe them and learn from them, and take any hard feedback you can get
- Read "good" books on communication. I found myself reading books on rhetoric and law (with a focus on communication though), not because I like those fields, but because they teach you a lot about communication, both written and oral
Some great advice, although Boris also noted that people tend to learn in different ways, so it is important to explore what works for you.
Conclusion
Within these answers, I find the similarity in the themes identified in ‘How to Become a Data Scientist’ to be most telling, in that:
There is no magic bullet and to truly improve, it requires a heavy dose of tenacity, and most importantly: experience, experience, experience.
It really is the only way, and with experience, of course your communication will improve, even if it is painful at first. And the more understanding you develop about the business context, the easier it will become.
Just whatever you do – do not bury your head in the sand by ignoring this vital part of data science. Otherwise you risk rendering your other skills irrelevant, and quite simply: what a waste that would be.
Have any other tips that I have missed? Please send me a message, or add a comment below.