Ready to learn Artificial Intelligence? Browse courses like Uncertain Knowledge and Reasoning in Artificial Intelligence developed by industry thought leaders and Experfy in Harvard Innovation Lab.
A decade ago, software was eating the world. But, right now it looks like artificial intelligence is eating the world. Not a day goes by without a new algorithm, framework or library being released that makes it ever easier to turn complex math into killer products. However, one aspect that seems to be conspicuous by its absence, is product definition of AI powered software.
The reason is not too difficult to fathom. Defining AI products is hard, and there are no industry standard methodologies to speak of. In this piece, I will not attempt to offer any silver bullets, but merely share some thoughts and experiences. But before we jump to the specific case of AI based software, let us first look at two other canonical examples from the software world.
Conventional software development
Most of our intuition and methodologies for product definition come from conventional software development. In an ideal scenario, building software products involves the following steps: starting with a business hypothesis about consumer behavior, deriving a specification from the hypothesis, building software according to the specification, testing if the software meets the specification, deploying the software in production, and determining the validity of the hypothesis.
An example of a business hypothesis for an eCommerce website may be “If we implement a one-click checkout process, then that will result in a higher conversion rate”. A business hypothesis is a codification of business intuition, which may have been informed by a lot of market research. However, once a hypothesis is formulated, the scope of its antecedent (the part between the if and the then) is usually very narrow and straightforward to work out.
A product specification is nothing but a representation of the necessary and sufficient conditions for the fulfilment of the antecedent of the business hypothesis. For the above hypothesis, the corresponding specification can be a wireframe for the one-click checkout process. Notice that we could unambiguously derive the specification from the hypothesis. Building the software according to the specification is complicated business. However, once built it is relatively straightforward to check whether it meets the specification or not. The specification provides a sufficient basis for building and testing the software.
The connection between the specification and testing is especially important. For the purposes of testing, a piece of software is treated as a system that produces outputs given inputs. Traditional software testing relies on the existence of a complete and independent test oracle, i.e. method for enumerating all possible inputs and generating the expected outputs for each input, independent of the software itself. Given this, the goal of testing is to “simply” verify if the software produces the correct output for every input. The test oracle is derived from the specification, which in turn was derived from the antecedent of the hypothesis. Thus, the simplicity and the narrowness of the (antecedent of the) hypothesis plays a determining role in the feasibility of developing software as it is traditionally done.
In traditional software development, the goal of the development team is to generate value for the business by building software according to business intuition. The business hypothesis and the product specification together form this important bridge between business and software, and a common basis of interaction for everyone. As such, the hypothesis and the specification need to be written in a manner that is understandable to product managers, software developers and testers.
In the old school way of doing things (aka waterfall), the different steps (such as hypothesis generation or specification) are performed sequentially for the whole product. Each step is performed by a different group of specialists with detailed handovers between different steps. Turns out that this is a fairly inefficient way of doing things. So in the the more modern Agile/Lean methodologies, the product is built up into small slices in short iterations. Nominally, the whole team participates in performing all the steps for each slice. Typically the hypothesis and the specification is codified in a user story or a close relative.
Academic scientific computing
At the other end of the spectrum is high performance scientific computing common in physics and related fields.Examples include lattice quantum chromodynamics to predict the behavior of subatomic particles, and computational fluid dynamics in many disciplines. In scientific computing a piece of software is written to find the (often approximate) answer to some question.
Scientific computing rarely allows for the hypothesis/specification split mentioned above. Consider the example of lattice quantum chromodynamics. The underlying hypothesis here is “If we were to calculate physical quantities using the theory of quantum chromodynamics then they will be very close to what we would get from experiments”. A specification for building a piece of software to test this hypothesis might include the equations of quantum chromodynamics and the commandment “build a piece of software to calculate physical quantities according to these equations”. This is actually a reasonable description of lattice quantum chromodynamics, but it is no good as a specification, because it does not help us test the software.
Testing of scientific computing is performed using a combination of different techniques: exactly solvable cases (there may be some inputs for which the outputs can be calculated exactly using alternate methods), limiting conditions (often the results are easy to calculate in certain limits, such as a input going to zero or infinity), and heuristics (whether the outputs make “sense” given prior experience). The test coverage, i.e., the fraction of the (input, output) pairs included in the testing, is usually much lower in scientific computing as compared to commercial software testing. Also, the test methods require very deep understanding of the domain. Usually, this is not a problem because building and testing of software is usually done by scientists who also happen to be domain experts.
In general, there is no clear decoupling between the hypothesis and the software in scientific computing. This results in the concept of a specification being of very little use. As a consequence complete and independent test oracles do not exist: the software was written to calculate the outputs (answers) given the inputs (questions). If we had another method for calculating the answers, then we would not need to write the software in the first place!
AI-based products
At its core, AI software development is very similar to academic scientific computing, in the sense of the hypothesis being very tightly coupled to the software, resulting in the non-existence of test oracles. However, commercial AI-based products are built in a manner that is very similar to traditional software development; driven by business intuition. Moreover, AI-based products are more often than not developed in teams that look very similar to traditional software development teams, with the addition of one or more data scientists. This impedance between its core nature and development methodology makes product definition so challenging for AI-based products.
To make things concrete, let us consider the canonical example of a recommender system for an eCommerce website. A possible hypothesis based on business intuition might be “If customers are shown relevant products then the conversion rate will increase”. This almost obvious hypothesis does capture the basic motivation behind building recommender systems. In addition, it has the advantage of being easily comprehensible to anyone involved in building eCommerce software. It is also broad enough to support the development of very powerful AI algorithms.
But, from the perspective of defining a product, this hypothesis is not good. It is not possible to derive an unambiguous specification from it. The problem lies in the vagueness of the word “relevant” — what is relevant in this case? It is simply not very useful when it comes to deriving product specifications, and consequently for building and testing software.
So where does this leave us? Before we go any further, let us try to understand the different dimensions involved in constructing a business hypothesis. In my opinion, there are two main relevant dimensions that we need to consider.
The first dimension is the generality of the hypothesis. Conventional software development is predicated on the assumption that it is possible to distil market research and business intuition into a set of narrow but useful hypothesis, which can then be used to build software. On the other hand, the whole power of AI-based algorithms comes from being able to start from very general hypotheses. Consider the previously mentioned case of a recommender system. We might narrow down the hypothesis to “If we show beer to male users trying to buy diapers, then that will increase the basket size.” Clearly in this case we will be able to derive precise specifications from it, and build the software to support it. In that case, congratulations, you have just built your first rules-based system! If we are to leverage the power of AI then such narrow hypothesis is not the answer.
The second dimension is the comprehensibility of the hypothesis. This comprehensibility is with respect to a common vocabulary that is known to all members of an usual product development team. Consider the hypothesis “Given a context (defined by the user’s history, the user’s features, the product catalog), and two algorithms trained on historical data with similar contexts with similarity being measured in terms of <insert_your_favorite_context_similarity_measure>, if products are shown based on the recommendations from the algorithm that had higher scores for <your_favorite_evaluation_metric_here>, then this will result in a higher <your_favorite_KPI_here>”. Although, the hypothesis above is general enough to support algorithm development, at the current time it will be nearly incomprehensible to non data scientists in product development.
There is another important constraint that we need to keep in mind. Real business-facing data science talent is scarce. Therefore, outside of the giants such as Google and Facebook, AI-based products will continue to be developed, at least in the near future, in diverse teams of data scientists, business analysts, software developers and software testers who have, at best, a shallow understanding of each others fields.
Given the above constraint, we cannot sacrifice comprehensibility completely. On the other hand, if we are to leverage the power of AI then we cannot completely sacrifice generality either. The important point to note here, though, is that in most contexts it is not necessary to develop the “best” or most powerful algorithms. Rather what is needed is to develop a solution that is “good enough”. However, when you are building something new it is usually not possible to calculate the effort (in terms of man hours) involved in getting to the “good enough” stage, i.e. you cannot timebox your way to a working solution!
In fact, the boundaries of what is good enough can, and should, be set in terms of business realities. What is missing is a common vocabulary. In order to appreciate the importance of a shared vocabulary let us consider the example of scientific research, and the scientific method in particular. The scientific method has been popularized within the software development community by the Agile and Lean movements. What is largely underemphasized is the importance of having a common vocabulary within a discipline for the scientific method to work. Typically a the scientific method consists of trying to understand a phenomenon by building a hypothesis, constructing a theory from it, running simulations and experiments to validate it. Each of these steps are usually performed by different groups of scientists who may not understand each other’s work in depth. However, what they do understand in detail are the claims and results made in each work. This is possible because they share a vocabulary. In physics, this vocabulary is usually built on top of mathematics. In chemistry, on top of mathematics and the language of chemical reactions. The same holds true in other scientific disciplines.
Our current vocabulary for software development is largely set by web development. This vocabulary consisting mainly of wireframes is completely useless when ideating AI-based products, and should probably be the first thing that needs to be thrown out of the window.
Like in physics, the common vocabulary of AI is also built on top of mathematics, with the more important elements being probability theory and statistics, linear algebra, and optimization principles. The vocabulary of AI product development must consist of a “user friendly” subset of the above. One is able to create and interpret a wireframe, for example, without ever understanding how an API call is made or how market research is done. What is the equivalent of a “wireframe” for AI? I believe that a good starting point is descriptive statistics, i.e. the business hypothesis and product specifications need to be written at least using the vocabulary of descriptive statistics.
Another piece of received wisdom that needs to be challenged in the age of AI is that market research and software development are completely distinct and decoupled disciplines, that the goal of software is to provide a means for validating a business hypothesis, not to inform it. This assumption is manifestly false for AI-based software. Because of the tight coupling between hypotheses and software in AI-based software it is nearly impossible to separate the learning (or explore) and the doing (or exploit) phases. In light of this, it is important to use AI not just as a tool for validating business intuition but rather as one for building business intuition as well.
In fact, once one has enough knowledge the relevant action is either straightforward to derive or one can often use out of the box AI algorithms to figure out the relevant actions. As such, the main focus of hypothesis for AI-products should probably be on learning and not necessarily on doing. And what is the piece of information that needs to be learned should be informed by AI as well.
As with any new paradigm shifting technology, it takes some time before the business processes can catch up. AI is no different. In the long run, I believe that a basic understanding of AI within the wider product development community is the solution. However, in the short run we cannot expect product managers to become data scientists overnight. We can, however, expect them to become data literate. And on the other side of the coin, we should expect data scientists to look up from their algorithms and understand the basics of value creation in a business.