Conventional recommender systems (RSs) rely on consumers’ feedback like product ratings to elicit parameters for personalized recommendations. Such an approach suffers severely from the biases caused by consumers’ self-selection behaviors. RSs fed with biased input may reinforce the biases and result in biased models that are incapable to effectively predict consumer preferences. Wang Cong, an assistant professor from the Department of Management Science and Information Systems, Guanghua School of Management, Peking University and her co-authors have proposed a recommendation system design to remove bias. Their conference paper entitled, “Training Personalized Recommender Systems with Biased Data: a: A Joint Likelihood Approach to Modeling Consumer Self-selection Behaviors”, were presented at the 42nd International Conference on Information Systems (ICIS).
With the proliferation of products online, consumers suffer from severe information overload, which hampers a smooth and satisfactory shopping process. To alleviate this issue, recommender systems (RSs) are developed aiming to assist the decision-making of consumers, facilitate more transactions, and generate greater revenues. Recommender systems, commonly based on machine learning techniques, are generally trained with user interaction data in the form of product ratings (explicit feedback) or clicks/purchases records (implicit feedback), both of which may be severely biased due to consumers’ self-selection in product exposure, purchase, and rating disclosure (Hu et al. 2009; Hu et al. 2017; Jonas et al. 2001).Recommender systems fed with biased input may reinforce the biases through machine learning and result in biased models that are incapable to effectively predict consumer preferences (Baeza-Yates 2018; Kleinberg et al. 2018). Therefore, it is desirable to design an RS that can mitigate the biases in the training data and achieve more accurate predictions of consumers’ preferences.
Drawing upon related theories about consumer decision-making behaviors, three types of self-selection biases can be synthesized, i.e., exposure bias, acquisition bias and under-report bias. Exposure bias is subject to the fact that consumers are generally selectively exposed to partial products, such as those they have interacted with before (Jonas et al. 2001). Acquisition bias, as can be understood with utility theories, indicates that consumers only buy products with positive net utility as evaluated prior to purchase (Fishburn 1970). In other words, observed purchases usually reflect relatively higher utility and, consequently, product ratings are more likely revealed by a group of consumers who have higher utilities with the corresponding products, giving rise to a positive bias. Under-report bias is induced by the fact that only a part of consumers will submit ratings after purchase. According to classic works on consumer satisfaction (Anderson 1998; Arndt 1967), consumers tend to reveal their opinions only when their satisfaction is very high or very low, showing a bimodal distribution. These three types of self-selection behaviors work together to form a widely observed J-shaped rating distribution (Gao et al. 2015; Hu et al. 2009; Hu et al. 2017), which is obviously biased but commonly used as the training data for RSs.
Given the above biases, using raw data as input to a recommendation system is problematic. However, no studies have systematically considered these three types of bias. In this study, drawing upon related consumer behavior theories, the authors propose a novel debiasing-oriented RS design that systematically models exposure, acquisition, and under-report biases. In terms of evaluation, two so-far collected unbiased rating datasets are used as the testbeds for rating debiasing. The heterogeneous effects of users with different rating disclosure patterns are also explored. Additional experiments are carried out on another real-world dataset to validate the performance of our method in purchase prediction. Various related methods are included as baselines to compare against our proposed method. The experimental results show that our proposed method can outperform all state-of-the-art baseline methods in both rating debiasing and purchase prediction, as measured with error-based and rank-based metrics. Hence, the authors contribute to the literature on recommender systems and design science research, by rigorously developing and evaluating a novel IT artifact with a solid theoretical foundation and strong business implications.
Self-selection biases are formally regarded as a data-missing-not-at-random (MNAR) problem in RS design. That is, the missingness of data depends on the unobserved data themselves (Little and Rubin 2002). Extant literature seeks to handle MNAR of input data in RS from three perspectives, i.e., the joint likelihood approach, the imputation-based approach, and the inverse propensity score (IPS) approach. The joint likelihood approach constructs a probabilistic framework consisting of two parts, i.e., data generation and data observation. The joint likelihood of data generation and observation is derived and the parameters are then estimated by likelihood maximization. The imputation-based approach works by artificially replacing unobserved ratings with a hyperparameter to fill the rating matrix and then the prediction error is minimized based on the complete rating matrix to infer parameters (Hu et al. 2008; Steck 2010; Steck 2013). The IPS approach seeks to minimize the rating prediction error by including estimated propensities (or probabilities) of rating observation. However, the empirical prediction error is closely related to the estimated propensity (Schnabel et al. 2016). Therefore, the inaccuracy of propensity can result in poor performance of debiasing.
By analyzing consumers’ behaviors in the purchasing and rating stages, the authors propose a unified generative model to unravel the three kinds of self-selection biases systematically. In line with the selective exposure theory (Jonas et al. 2001) and utility theory (Fishburn 1970; Jehle and Reny 2011) in consumer decision making, exposure and acquisition biases with imperfect information effects are incorporated in the purchase stage. In light of the consumer satisfaction research (Anderson 1998), under-report bias is modeled by linking consumer ratings with various rating disclosure propensities. In terms of the underlying rating generation model, the widely adopted matrix factorization manner is adopted (Koren et al. 2009). The joint likelihood of data generation and observation is then derived; and the expectation-maximization (EM) algorithm is used for parameter inference, where some simplifying techniques are used to facilitate the calculation for efficiency.
The experiment collected real data from three platforms, Yahoo!, Coat and Goodreads, where the data excluding self-selection bias was obtained by asking Yahoo! users to randomly select some items to experience and rate. The study evaluates the performance of the proposed recommendation method through a large number of experiments and answers four experimental questions. The results show that the authors’ method does outperform other benchmarks in rating de-bias, and is equally good at predicting rating disclosure behavior, predicting purchase behavior, and that the method is better at eliminating consistently upwardly biased ratings. Arguably, the method consistently outperformed the baseline method on a variety of metrics. The debiasing effect with respect to heterogeneous rating disclosure patterns is also examined, reflecting the robustness of our proposed method in handling various rating disclosure patterns.
The work makes several research contributions and implications. Theoretically, the work sheds light on the decomposition of self-selection biases in the consumer purchase and rating process. Referring to the kernel theories of selective exposure (Jonas et al. 2001), utility (Fishburn 1970), and consumer satisfaction (Anderson 1998), the recommendation method comprehensively models the decision process of consumers, which is shown to avail debiasing. An integrated underlying rating generation and rating observation process is formulated to depict the decision process of consumers and the underlying ratings are estimated for a personalized recommendation. Hence, the work manifests the significance of IT artifact design guided by kernel theories (Gregor and Hevner 2013).
The study contributes to the body of design science literature by proposing an effective debiasing recommendation approach through rigorous design and evaluation to solve critical business problems. It also offers several implications for businesses. First, with the proposed approach, the underlying ratings can be estimated more precisely without much additional computational complexity. With more accurate rating prediction, more precise recommendations can be delivered. Second, the heterogeneous effects of this method indicate that it is more applicable for consumers with consistently positive bias. As is shown by the related literature (Hu et al. 2009; Hu et al. 2017), in most cases, consumers are with positive biases, hence the proposed method can well accommodate various scenarios.
The work also opens some new direction for future works. One is to further take account of other types of biases in recommendation systems design. Another possible direction is to develop a ready-to-use dataset that can be used to evaluate the debiasing effect of implicit and explicit feedback simultaneously, where review text can also be involved as an important information source for debiasing.
About the author
Wang Cong is an assistant professor at Guanghua School of Management, Peking University. Dr. Wang received her BA in information systems and economics from Peking University and Ph.D. in management science and engineering from Tsinghua University. She worked as a postdoc fellow at Carnegie Mellon University prior to joining Guanghua. Her researchinterest lies in the intersection of big data analytics, machine learning and management information systems, focusing on decision support with uncertainty and temporal dynamics, pattern recognition and knowledge discovery from big data, as well as their applications in areas of e-commerce, fintech and healthcare etc. Her research work has been published in journals including INFORMS, Journal on Computing, Decision Support Systems, Fundamental Researchetc.