What is Data Science?

Gianluca Malato Gianluca Malato
November 5, 2019 Big Data, Cloud & DevOps

Myths, dreams, and reality of this beautiful job

Data Science is considered as one of the most modern and fascinating jobs of our time. It can be funny and can give you satisfaction, but is it really as it’s described?
In this article, I’ll show you the reality of a Data Scientist’s life.

What you think it is

At the beginning of their career, Data Scientists think that Data Science is a wonderful, magical world full of algorithms, Python functions that performs every possible spell with a line of code and statistical models able to detect the most useful correlations among data that could make you an invincible superhero in your company. You start dreaming about your CEO congratulating with you and shaking your hand, you begin to see decision trees and clusters everywhere and, of course, the most terrifying neural network architectures your mind can dream.
But since the very first day of your first Data Science project, you start to realize what reality is.

What it really is

Expectation for results

Managers often think that Data Science is the Holy Grail of information technology. They have huge expectations about it and they want them to be satisfied here and now.
In reality, results are very difficult to achieve and need much time. Sometimes a result can’t be reached. Think about clustering, for example. You can spend an entire life searching for a clustering pattern that simply doesn’t exist in your data. Most managers don’t understand this fact and it can be very stressful for you and your team.

Explaining

The only thing better than a good algorithm is an explainable algorithm. Never forget this. No sane manager in the world would follow an unknown algorithm for managing their company’s money only because its AUROC is greater than 95%. Managers need to understand algorithms, figure out how they think about data and this is often a great task for a Data Scientist. Explaining algorithms to somebody with no scientific background can be quite difficult, but it’s very common in large companies and you must face this fact.
Most of your time you’ll find yourself trying to erase that awful question mark on your boss’ face, simplifying as much as possible to make them understand your results. Remember: if you can’t explain your results, managers will start to ask themselves whether you are useful or not in your company.

Business understanding

You’ll spend a lot of time interviewing product owners and ITC professionals to understand the information hidden inside business data they know or produce. There’s no way you can make it without their help.

Many times data comes from complex and heterogeneous systems and this often implies lines of log files that you need to understand. Data isn’t everything; information is everything. Never forget this. Information is buried inside data and you’ll need somebody telling you where you should dig.

The larger the company, the more difficult it is to find the right people to interview and when you finally make it, their answers will generate more questions and these people may not have enough time for you and your “nerdy stuff”.

Data visualization

You’ll find yourself using data visualization more often than you would have ever imagined. Charts, slides and other graphical tools will be like silver bullets in your shotgun. Maybe you have magic formulas in your mind, graphs and so on. Forget about them. Data Science is told by graphical representations and it’s often difficult to find the proper visualization technique suitable for your audience.

Deadlines

There they are. We are slaves in a world of deadlines and expectations. When you were a software engineer you had milestones in your plan and you weren’t allowed to delay a second. In Data Science, things aren’t easier.

There are deadlines and milestones even in Data Science, and there is a great difficulty inside them: Data Science is something very close to academic research, so it doesn’t fit well in the classical, waterfall ITC project management style. Instead, some Agile framework (e.g. Scrum or Kanban) should work well, due to its physiological ability to quickly adapt to changes. But Agile is difficult to teach to managers. It can give them the false idea that there’s no clear delivery date and this is very difficult to accept by companies.

Algorithms and programming

And finally, the fun part. Python, R, Knime, reading scientific papers, optimization algorithms, cross-validation and so on. The technical and nerdy real fun is a very small part of the work and it takes very little time in the whole project lifetime. Maybe you have already lost enthusiasm in the previous phases before writing your first line of code and things no longer seem as funny as you thought at the beginning.

What’s the best way to do Data Science?

According to my experience, I can answer with a single word: Agile. There’s no need to perform all the business understanding part before writing your first Python code line. Start with a simple business understanding of a small piece of data, explore it, visualize it and begin with a simple model. Create the first, quantifiable results week by week keeping your customers constantly engaged in the process. Deliver small results with a constant delivery rate and, please, don’t fall into the waterfall trap.

Simplicity is the key. Never forget it. Start with the simplest things possible and add a small piece of complexity only if needed.

There’s a psychological sense of relief in constant, small results and this is another weapon you have to use if you want to survive in the jungle of companies’ deadlines and business processes. In this way, every colleague of yours who is committed to your project will feel your difficulties and start to understand how difficult Data Science is.

Remember, companies still think about Data Science as an ITC branch; they are not completely wrong, but they shouldn’t expect you to follow the waterfall approach. So, you have to suffer the struggle to guide your company toward an Agile way of thinking.

Concerning the explanation part of the job, I prefer to start with the simplest machine learning model possible: k-nearest neighbors. It’s very easy to understand. You only need paper, a pencil and a Cartesian plane with some points drawn on it. That’s it. If it produces very nice results, everybody will finally see you like the great business partner you think you are.

If KNN doesn’t work, then you can use regressions and decision trees (random forests, gradient boosted tree classifiers and so on), which are very easy to explain, or Bayesian networks, which have a very useful graphical representation.

Finally, visualize. Visualize everything. Ask your boss to buy you a course in data visualization, learn as much as possible about the best visualization techniques and, please, remember to avoid pie charts. They are pretty useless and misleading. If you provide a simple scatter or bar plot, people will catch all the relevant information.

Simple results are the best ones. Some days ago, my team and I presented some results about a time series analysis using only three slides: high-level KPIs describing the business phenomenon, a confusion matrix and some performance metrics. Our audience was enthusiastic since the first slide, only because we started with clear numbers explaining the business in a simple way. In many situations, a small building block can really save your life.

Conclusions

Data Science is an exciting job, but it can be very difficult to perform if you speak to a non-technical audience. Data and business are intimately related to each other and you must remember this point when you work with business-oriented people. The only way to survive is to find a middle point between a data-driven bottom-up approach and a business-driven top-down approach.

Finally, as Data Science is hard and time-consuming, delivering small results with a constant delivery rate is the only way you can keep your customers engaged.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Gianluca Malato

    Tags
    Data Science
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Seven Ways Technology Plays a Critical Role in Culture Building at Work

    Seven Ways Technology Plays a Critical Role in Culture Building at Work

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.