Data Science Governance – Don’t Reinvent The Wheel

Bill Franks Bill Franks
April 6, 2021 AI & Machine Learning

As data science processes continue to become operationalized and embedded within business processes, the importance of governing those processes continues to rise. While governance has been a major focus for many years when it comes to managing data, governance focused on data science processes is still far less mature. That needs to change. This blog will discuss a couple of distinct areas of governance that organizations should consider.

Governance and Ethics Are Inextricably Linked

When defining governance procedures and guidelines, it is necessary to account for ethical considerations up front. The reason is that once governance policies are put in place, they will incentivize and disincentivize various behaviors. Without accounting for the ethics of those behaviors, there is a risk of creating a terrifically managed and tightly governed process that does horribly unethical things.

Imagine that a company creates a process 1) using well-governed data on people’s behavior that has been 2) prepared with a well-defined and consistent set of computations to 3) generate summary metrics to feed into a model. Furthermore, the company monitors the performance, bias, and consistency of the model while also tightly controlling who has access to the output and what it is used for. Sounds like a very well governed process, doesn’t it? Now imagine that the model produced used that social media data to predict who is likely to commit a crime so that law enforcement can intervene as in the movie Minority Report.

Such a process may be well-governed, but it is horribly unethical. That is why I said that you can’t separate ethics from governance. To be truly effective, governance must be ethically sound as well as technically rigorous.

Auditing A Process Doesn’t Mean Revealing All Its Secrets

It is often necessary to perform audits to prove that a data science process is working appropriately. A common concern is that in order to provide a complete audit, it is necessary to reveal the “secret sauce” behind the process. While this concern is especially common if a 3rd party will be performing the audit, it does not have to be the case.

Consider beverage giant Coca-Cola. Only a couple of people in the entire world know the full recipe for a bottle of Coke, and none of those people have a regulatory oversight role. Yet, people are still comfortable that Coke products are safe to enjoy. Why is that? First, while the exact mix of ingredients in the recipe may not be known, they are all standard food products. So, both the company and oversight agencies can confirm that any given ingredient going into a Coke is safe and approved. Similarly, the final product can be checked for toxins, chemical composition, etc. to ensure that the ingredients were not somehow mixed in a way that caused unforeseen problems. In other words, it is possible to audit that a Coke is safe to drink without having to know the secret formula.

The same is true with machine learning and artificial intelligence. To validate that a process accurately predicts what it is attempting to predict, is free from bias in those predictions, and that the predictions are stable over time, it is not necessary to unveil the exact formulation of the underlying model. By passing a wide range of data to the model, we can demonstrate accuracy, consistency, and bias level while still maintaining the confidentiality of the secret sauce behind the model. It is possible to have algorithms that provide a competitive advantage, while providing strong governance and auditing of the process, without revealing the core IP that has been developed. Therefore, there is no reason to argue against auditing. I’m actually a fan of the idea of having 3rd party auditors involved much like is done in the accounting space. We may soon see a company rise to prominence by providing such services.

Borrow from Other Fields Liberally!

One thing those of us in the data science field are often guilty of is trying to build things ourselves, even if there might be something close to what we need already available. Rather than tweaking the existing approach to our needs, we start from scratch. The urge to do this should be resisted!

When it comes to governance as it relates to safety, quality, and audits, there are highly mature approaches in other disciplines that can be borrowed. Traditional product development and engineering teams have strong protocols that have been developed over many decades. While it is certainly true that engineering protocols for safety assurance will not translate directly to data science processes, it is also true that tweaking an engineering approach to fit within a data science context is probably a faster path to progress than developing and testing protocols from scratch.

One terrific example of a set of protocols that data science teams have adapted successfully is in the area of agile software development. While the agile protocols originally developed for software developers do not translate exactly to a data science context, many require little change. Data science teams now follow agile analytics protocols that take full advantage of the principles originally produced to support agile software development. Sure, there are some differences and additions, but the data science community is certainly better off for borrowing from a proven approach in a related discipline than if we tried to start a new grassroots approach on our own.

Don’t Make Governance Harder Than It Needs to Be

Governance is not nearly as interesting and engaging as creating awesome data science processes, but it is necessary. Do not assume the pain we face in tackling data science governance has to be long and painful with a lot of totally new protocols needing to be developed. The data science community can borrow and adapt much of what has been done by others in the areas of data governance, quality control, safety, and auditability. By resisting our urges to create bespoke approaches from scratch, we will not only accelerate our efforts, but we will avoid learning the same hard lessons that others learned as they built the governance processes we are borrowing from.

Originally published by the International Institute for Analytics

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Bill Franks

    Tags
    Business ProcessesData ScienceEthicsGovernance
    Leave a Comment
    Next Post
    Security Trends To Prepare For In 2021

    Security Trends To Prepare For In 2021

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.