This post is based on a joint research project with Pierangelo Rothenbuehler, who is also a part of the Experfy community***
Expert or intelligent systems can take many forms. In customer relationship management they should mostly help to understand the context of a customer: How does she feel? What are her motivations? What is she looking for? This allows us to provide meaningful context-based content to the customer. While motivations and intentions cannot be observed without mental involvement of the subject under study, behavioral observations are rather easy.
At least in the digital economy they are. Most off-the-shelf BI stacks will enable you to obtain dense behavioral observations of your customers in digital products. Can we infer motivations and intentions from this? Not really, but indirectly, by using predictive analytics and anticipating future behavior (e.g., a customer who is about to stop using your product is very likely to have low motivation for product use). Intelligent systems predicting churn have big value. For example, this previous Experfy blog post describes a case in the Telecom industry where it is very easy to switching providers. Using an expert system that predicts churn, customer relationship management can target customers who are likely to churn and give them incentives. These can be vouchers, discounts, gifts, or offerings to move to another product inside the company’s portfolio.
Methods for Binary Behavioral Prediction
By binary behavioral prediction we mean predicting if a behavior will occur or not. This is a classification problem. A number of techniques are applied to this – from the (most) common logistic regression over support vector machines to decision trees and neural networks. Whenever you face a prediction problem, time is inherently included in your model you want to predict a future value of a criterion using data available in the present. But in most cases your model and predictors do not address any sequential ordering. Temporal dynamics are not explicitly modeled – different lags of the predictors are all treated as static data. However, there can be value in incorporating sequential information in your prediction models. A very simple way of doing so is conducting some feature engineering on the time series of the predictors: i.e., taking the slope, the deltas, or the standard deviation of your time series observations and including them as predictors. See Figure 1 for an illustration of this. But of course, this is not where it stops. Other interesting approaches to sequence modeling include the following:
- Time series econometrics as its own discipline, often in the form of panel data econometrics for individual behavioral predictions.
- Survival analysis as a subtype for panel data analysis.
- Markov models and Hidden Markov Models for modeling sequences of states.
- Advanced efforts of including sequential information in other ways (e.g. Prinzie and van den Poel 2006, Eichinger et al. 2006)
Figure 1: An illustration of features addressing sequential information of the predictor time series; 14 day predictor time series; weekly delta on the left, weekly slopes on the right
The Application
In a recent project we looked into time feature engineering and Hidden Markov models to incorporate temporal dynamics in our predictive models. We work in the mobile app industry and adopted the freemium business model – there are two key binary behavioral prediction challenges that we face: Will a user churn from our app? Will a user spend money on in-app purchases? Both challenges were addressed previously, in the churn case and in the purchase case, but with limited attention to time and sequence modeling. In formal conversations, some peers reported that only using slopes had proven to result in the best model for them. In our predictive endeavors up to that point in time, the static lagged behavioral data had always done a good enough job. Hence, we would like to understand the predictive value that time/sequence modeling actually has. Let's get right to it.
Time Feature Engineering
We built churn prediction models for an app published on the iOS platform. We do not see a reason why the finding wouldn't hold for other behavioral predictions and other mobile apps. We ran a PCA on the predictors, thoroughly cross-validated and checked different algorithms.
Time feature engineering adds predictive value, but only marginally.
We used area under the ROC curve (AUC) to assess predictive performance. This is appropriate in light of the high class imbalance present in both churn and purchase prediction in freemium mobile apps (if you are interested in the reasons see Weiss 2004) Table 1 gives an overview of predictive performance of static features only, time features only, and both combined for different subsets of the customer base. As you can see, using time features only turns out to be a poor choice in our case, while static features only does a good job. The combination of both feature sets performs even better, but the predictive performance is not statistically significantly better.
Table 1: Mean AUC and its 95% confidence interval for different set of input features and different subsets of the dataset; logistic regression with 10-fold cross-validation
(Hidden) Markov Models
We skipped Markov models (research on Markov models for churn prediction: Burez and van den Poel 2007, Eastwood and Gabrys 2009) and moved right to the hidden ones because they are able to detect more complex states than their simpler siblings. Same app, but results are a bit more complex. HMMs are very different from regression models. To give them the attention they deserve, we decided to put the results into a short paper. Trying to summarize them is tough, but let me give it a shot:
We found that HMMs have roughly the same predictive performance as other algorithms.We also generated some pretty cool insights on transition patterns and user states. Moreover, one key advantage in practice is that HMMs are deployable at lower cost.
But please look into the paper to dive a bit deeper into our results. We will also be presenting it at SAI IntelliSys in London on November 11. If you can't make it, feel free to get in touch with your comments and feedback.
Figure 2: ROC curves for different algorithms; interpolated between operating points for HMM; HMM, neural network and logistic regression are pretty much head-to-head
So What?
While it is fascinating to get involved in complex and advanced methods and optimization, we shouldn’t forget that simple, robust, and fast solutions very often win in practice. Especially if your company is still at the beginning of its predictive analytics journey, we suggest you start off with a simple model (e.g., logistic regression or decision tree) and static data. Once you start seeing the value and devote more resources to advanced and predictive analytics, you can start with time and sequence modeling to aim for the higher hanging fruit.