Please use this identifier to cite or link to this item:
Title: Churn prediction based on text mining and CRM data analysis
Authors : Schatzmann, Anders
Heitz, Christoph
Münch, Thomas
Proceedings: Conference proceedings of the 13th international science-to-business marketing conference on cross organizational value creation
Pages : 296
Pages to: 310
Conference details: 13th International Science-to-Business Marketing Conference on Cross Organizational Value Creation, Winterthur, 2–4 June 2014
Publisher / Ed. Institution : Fachhochschule Münster
Publisher / Ed. Institution: Münster
Issue Date: 2014
License (according to publishing contract) : Licence according to publishing contract
Type of review: Not specified
Language : English
Subjects : Churn; Churn prediction; Text mining; Text data; Random forest; CRM
Subject (DDC) : 658.8: Marketing management
Abstract: Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. Filling this gap, we present a method for estimating churn probabilities directly from text data, by adopting classical text mining methods and combining them with state-of-the-art statistical prediction modelling. We transform every customer text document into a vector in a high-dimensional word space, after applying text mining pre-processing steps such as removal of stop words, stemming and word selection. The churn probability is then estimated by statistical modelling, using random forest models. We applied these methods to customer text data of a major Swiss telecommunication provider, with data originating from transcripts of phone calls between customers and call-centre agents. In addition to the analysis of the text data, a similar churn prediction was performed for the same customers, based on structured CRM data. This second approach serves as a benchmark for the text data churn prediction, and is performed by using random forest on the structured CRM data which contains more than 300 variables. Comparing the churn prediction based on text data to classical churn prediction based on structured CRM data, we found that the churn prediction based on text data performs as well as the prediction using structured CRM data. Furthermore we found that by combining both structured and text data, the prediction accuracy can be increased up to 10%. These results show clearly that text data contains valuable information and should be considered for churn estimation.
Departement: School of Engineering
Organisational Unit: Institute of Data Analysis and Process Design (IDP)
Publication type: Conference Paper
DOI : 10.21256/zhaw-1889
ISBN: 978-3-938137-57-4
Appears in Collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2014_Schatzmann_Churn prediction_Proceedings.pdf338.87 kBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.