Domain Expertise vs Machine Learning Debate

When I’m getting ready to reason with a man, I spend one-third of my time thinking about myself and what I am going to say and two-thirds thinking about him and what he is going to say. Abraham Lincoln (1809-1865)

If human beings had to reason with a machine, or more specifically, had to teach machines to reasonis Lincolns formula still relevant?

Framing the debate on the superiority of machine learning vs. domain expertise

This debate has gone on for quite some time now. It was even rehearsed by several luminaries of the data science world at a Strata conference in 2012, as you can see in the video here.

With the rapid progress in the field of artificial intelligence, human beings are apprehensive about robots conquering human thinking! If one extends that argument to expert systems of data science, then one can easily see why some people would see machine learning supplanting domain knowledge.

But the real question lies not in exploring whether data scientists require domain knowledge to build expert systems, but whether the representation phase of data can be accurately achieved without involving domain experts. Domain experts are presumed to be far more capable of identifying, articulating, and demonstrating day-to-day process problems in business. As these experts can jolly well explain a research problem to peers, it is probably absurd to even consider that an expert system can be constructed without their involvement or guidance. The same should hold true in the case of superior algorithms required to create such systems.

Let us take the example of creating a machine learning system that can distinguish spam messages from non-spam messages based on the science of algorithmic filters. In this case, the email users, including the data scientists themselves, are domain experts, since they happen to use the email system every day. To build an expert anti-spam application, developers would be expected to work closely with the user community to understand their spam-related problems. Here the developers themselves may be a sub-domain expert within the broad category of broad mail-domain experts (users).

Thus, we realize that the fundamental strategy behind designing an expert system lies in identifying patterns in usage data; and then deriving general rules or principles from those patterns. This principle of generalization points to the ability of an expert system to perform new, tasks after having experienced a learning data set.

So here we establish the first criterion for performance: Performing tasks based on prior experience with data. We may even conclude that machine learning operates from the standpoint of predictionbased on known properties discovered from the training data.

Domain experts usually have an in-depth knowledge of operational processes or tasks and also understand the rules of thumb that control the domain. Domain knowledge is gained from actual, practical experience. In reality, data scientists attempt to convert this practical knowledge into meaningful algorithms to automate processing tasks.

Thus data scientists and domain experts are the two complimentary sides of a complete system development project. In developing expert systems, it is not enough for data scientists to ask questions or find patternsthey also need to understand the results.

An example from the field of Molecular Biology

Machine learning may be necessary to research the human cell, but it cannot be the starting point. Therefore, data scientists must collaborate with molecular biologists to understand the complexity of cell behavior under different conditions to analyze or process the findings of ones research. The biologist is equipped with a priori knowledge, known as domain expertise that provides accurate insights for monitoring and analyzing the results of an experiment or a series of experiments.

In ideal situations, machine learning will be utilized not just to predict the behavior of complex processes or organisms but also to harness the power of machine learning to help intellectual communities understand the reasons behind the behavior.

The flip side of the argument: Is domain knowledge really necessary?

An example of a competition in a crowd-sourced environment

In a Kaggle competition, a panel of space agencies developed a competition writing algorithms for studying the impact of darkness on images of space. The winner happened to be a student of glaciology, thus proving domain knowledge was inconsequential in winning this competition.

This competition routinely raises this question: if data scientists can develop such fine algorithms in any discipline or business field, then who needs domain experts?

If now we compare the two opposite viewpoints presented above, the general consensus is likely to sway in favor of a compromisehitting the middle ground between machine learning and domain expertise.

The domain experts still play a critical role in helping data scientists understand and articulate the business or process problem and help understand the results. As a case in point, an economist may create the best algorithm to automatically grade SAT answer papers, but the education experts must be involved in designing the grading criteria and sample questions and answers. These educators are the only people proficient in interpreting the output.

Lets compare the relative strengths and weaknesses of domain experts and data scientists.

Machine Learning Experts/Data Scientists: Pros

Can ask questions without understanding processes or tasks
Can study data to discover repetitive patterns
Can reconstruct process knowledge by studying data
Can use data patterns to predict results

Machine Learning Experts/Data Scientists: Cons

Cannot analyze the existing models of business processes accurately
Have the potential to misuse models
Lack depth of understanding of business functions

Domain Experts: PROs

Can provide practical insights from past experience
Can help refine a question with practical knowledge
Can accurately shape or model tasks for analysis
Can guide analytics in the right direction
Can evaluate the effectiveness of a result

A domain experts strength lies in close observation of day-to-day process problems, while a data scientists strength is building generalized solutions in the form of algorithms by studying specific data patterns.

After reviewing the previous arguments, we can take a more balanced view that both the data scientists and domain experts need to collaborate or work in harmony to accurately solve business process problems. A final example helps show the need for collaboration.

An example: Learning Management Systems (LMS)

A critical first step in developing a learning system starts with gathering information about potential learners competency in specific skills or tasks. The objective behind collecting this information is to estimate the scope of the learning system in terms of topic or task coverage. Quite often, the approach is back-to-front; the learning outcomes and performance tests are defined before creating the learning content.

While trying to create an automated system that claims to help the learner achieve the desirable performance objectives, the machine learning experts have to collaborate with education experts, subject-area experts, and learning and development experts to design and create an effective, automated learning system. This example helps us see the value of cross-domain experts in developing learning systems.