Do you understand the difference between logistic regression and linear regression? If your eyes just glazed over, you’re exactly the person who should be reading this post. If you understand these concepts fully, please read the companion piece from my colleague Dr. David Herberich (which will be posted soon) and is the geeky version of this post.

What is a logistic regression? It is a way to predict the probability of something happening. It answers questions like the probability of a customer canceling their account or the probability of a customer using a coupon. This kind of analysis is very common in academia, but after ten years of doing analyses at hundreds of companies, in dozens of industries, I have never found a case where it made sense for business operations to use that information directly. Let’s take the coupon example to get the the first reason you should never use logistic regression. Do you just want to know whether the customer will use the coupon or do you actually want to know what the increase is in the amount the customer will spend if they use the coupon? In the churn example, it may be somewhat useful to know a customer might cancel their account, but if you don’t know when they will cancel the account, you can’t really do much about it. For example, if two customers both have a 60% probability of churn, but one is expected to churn in the next day and the other is expected to cancel their account in 30 days, would you not want to focus your attention on the customer who is about to leave immediately?

So that’s the primary reason why you shouldn’t use logistic regression and why I urge customers to always predict a number that directly impacts how they will act on information, not information for the sake of information, but information that leads to ROI. Still not convinced? Here are three additional reasons you should never use logistic regression. Let’s take an example where you are trying to predict whether a customer will cancel their account after a customer support problem. But how do you define whether churn happened? Let’s say we are in a situation in which we are looking at a customer support interaction and analyzing whether a person cancelled their account soon after that interaction. In a logistic regression analysis, we would come up with some magical cut-off point, say 30 days, and anyone who cancelled within 30 days would be considered a case of churn related to that customer complaint and a cancellation after 30 days wouldn’t be considered churn. A different analyst might say the cutoff should be 180 days or the cut-off should be one week. There is obviously no objectively correct answer to where the cutoff should be. But once you have established this cutoff point, customers on the two sides of the cutoff point are treated as separate classes. So someone who cancelled 30 days after the call is identical to someone who cancelled 30 minutes after the call and completely different from someone who cancelled 31 days after the call. This may make academic sense but it certainly does not make sense in the world of business where the operational team would need more granular resolution so that they could figure out how to focus their scarce resources. They would definitely want to prioritize the guy who would cancel in 30 minutes over the guy who would cancel in 30 days.

Wondering why this makes sense, even in the case of academia? When you are looking at binary outcomes (like whether or not a patient was infected), if you want to get an exact measurement of probability, you may be better off with a logistic regression. Especially in cases where things have a very high or very low probability of happening, linear regression does not do an excellent job. But again, remember how operations happen in the real world. Once you create this model, you still have to turn it into a simple set of rules that will be fed into a workflow system that tells your employees what to do for each type of customer. Typically you would split customers into one of a few predefined approaches, for example, you might give the customer “white glove” treatment where you give the best customer service and treat the situation delicately. Give them the “blue glove” treatment where you perform surgery on the product that you’re offering. Finally, you might give them “no glove” treatment where you just wave bye-bye to the customer. Notice that by the time we got to this operational rule, the precise probability no longer mattered. An event that would occur rarely would just fit into the same intervention – it really doesn’t matter whether the probability was .1% or .15%. So getting the probability precisely right doesn’t matter in the world of business. All you need to do is be good enough so that the case is matched to the right intervention. As David’s exhaustive analysis shows, linear regression does a perfectly good job of making sure the cases fall into the right categories.

These are the four primary reasons I give executives for why they should focus on linear regression rather than logistic regression. But the reason that is closest to my heart is that most business users can easily understand the results of a linear regression if the appropriate effort is made in explaining it to them, but because business users are not used to dealing in probabilities, they often fail to fully grasp what a logistic regression is trying to tell them. If we want to succeed in business, we have to empower the business user and ensure that they can easily overlay their years of experience and domain knowledge on that analysis. Disagree? Please make your argument in the comment section

Ask about Salesforce products, pricing, implementation, or anything else. Our highly trained reps are standing by, ready to help.