What does it mean to be a Senior Data Scientist?

Peadar Coyle Peadar Coyle
March 29, 2018 Big Data, Cloud & DevOps

Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.

On being a Senior Data Scientist

pexels-photo-590493.jpeg

I present some ideas, these aren’t necessarily in order.

1. Senior Data Scientists understand that Software/ Machine Learning has a lifecycle and so spend a lot of time thinking about that.

Technical debt, maintainability, systems design, design docs, etc. These are all

“What could I be missing?”

“How will this not work?”

“Will you please shoot as many holes as possible into my thinking on this?”

“Even if it’s technically sound, is it understandable enough for the rest of the organization to operate, troubleshoot, and extend it?”

2. Senior Data Scientists understand that ‘data’ always has flaws. These flaws can be data generating processes, biases in data.

I once did a technical interview with a Senior Data Scientist as a candidate – and I was a bit flummoxed at the question at the end which was ‘what if the data is wrong’. It’s a valid question and one we should think about.

Often a lot of the populations we end up observing aren’t randomly sampled and we need to think a bit about how to manage this. I find anecdotally that Junior Data Scientists often think that this is not the case.

3. Senior Data Scientists understand the ‘soft’ side of technical decision making.

Increasingly I see tool choices being made and wonder about the ‘feeling’ aspect of those. It can be for example that ‘static languages are best’ or ‘we should use pytest not unittest’, increasingly this is because of ‘taste’ or ‘feelings’ or ‘philosophy’. And those are perfectly reasonable things. For example, I love the pytest functional syntax, however, I know other engineers like other tools – and that’s ok. The other thing is that sometimes people have bad experiences with tools from particular vendors, or in particular ecosystems. If you, for example, worked at a company that wrote software in Zorg and you found it incredibly difficult to deploy, and the project was a complete failure, then you’d have an emotional response to Zorg if it’s brought up in a company meeting. Engineers and Data Scientists often are obsessed with the rational, but our feelings about architectures, software matter. Otherwise, we’ll never get the buy-in we need. I’ve not finished reading it – but a book that’s been recommended by a few senior Technologists who I respect is – Words That Work: It's Not What You Say, It's What People Hear.

A corollary to this is that we can produce Machine Learning models that don’t get used.

4. Senior Data Scientists focus on impact and value

If a deep learning model doesn’t get into production because of lack of trust – you’ve failed. It’s not about satisfying your intellectual curiosity, or your need for ‘Resume Driven Development’. It’s important to think about buy-in and your time to value. As Erik Bernhardsson tweeted –

Think most of my value of knowing machine learning these days is gained from telling people why ML won’t solve their problem

This is terribly important, sometimes a simple rules engine will do. Sometimes just a SQL query. Using the right tool for the job is very important. This is complicated though, and there’s not often one ‘best’ solution, all solutions have trade offs.

Often you can make things simple with data, a question I like asking Data Scientists is ‘when did you decide not to use ML’, for example, a few years ago I saved thousands of dollars at $OLD_EMPLOYER by building a data pipeline, for some analytics. Some of the analysis pipeline involved matching text, for inventory management, for example, inventory names would be similar – so it seemed natural to use fuzzy-matching or something similar. It turned out this algorithm was too slow and impractical. And it turned out by monetary value there were 100 inventory items that needed matching, so I simply encoded the most common misspellings/ abbreviations in a dictionary. This was tremendously valuable, and a much more robust solution than using Machine Learning. Sometimes automation is what you need to do  Sometimes counting and sometimes Machine Learning.

5. Senior Data Scientists care about ethics

Recently in the Data Science and Tech communities, we’ve seen the need for discussion of ethics. There’s been some interesting and worth reading literature on this from the Academic communities, and I’ll not wade too much into these debates.

However as a Senior Data Scientist working in a regulated world of Financial Services – I’ve grown to appreciate that it’s my job to have a working knowledge of GDPR, it’s something we regularly bring up when we discuss the viability of projects, and it’s a ‘risk factor’. It would be immature to just ignore this, and frankly unethical and unprofessional.

At the very least Senior Data Scientists should read some of the code of ethics in Data Science and have views on these. Ideally, you should have your own code of ethics, and maybe enforce those on yourself, certainly, you should bring that into account in your risk planning, and in terms of what data you get access to, and how you integrate security. This can, unfortunately, add to time frames, but doing things ‘right’ both in terms of customer trust and in terms of good compliance often takes longer time. As we’ve seen with the Theranos affair – ‘move fast and break things’ isn’t always the best motto.

Acknowledgments: Thanks to Eoin Hurrell and Bertil Hatt who helped with fleshing out these ideas. I’m grateful also to conversations with friends such as Eddie Bell, Mick Cooney, Mick Crawford and Ian Ozsvald and some of my Zopa colleagues including Dat Nguyen and Vlasios Vasileiou. I learn from most people I speak to, so sorry if I’ve forgotten.  Finally also thanks to Audrey Somnard, who has constantly reminded me that ‘algorithms do what they want’ isn’t a sufficient ethical explanation and I should think more about these issues.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Peadar Coyle

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Five Amazing Improvement Big Data Can Bring to Retail

    Five Amazing Improvement Big Data Can Bring to Retail

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.