Every business constantly has to answer questions in order to make decisions. There are easy ones like “How much revenue did we generate in each country, divided by business line?”. You just have to have a look into the rear mirror by analyzing historical data with your Business Intelligence tools.

But what do you do if your question is “How much revenue growth can we expect in each country, divided by business line?”. If you are Tom Cruise having Precogs available, you are lucky. Anyone else would typically do market research, try to anticipate trends, ask colleagues or just rely on their experience and gut feeling.

Nothing wrong with that. But if you like to make informed decisions based on facts and evidence in order to do things quicker, better and cheaper, you will like Predictive Analytics.

Predictive Analytics is a way to get insights about the probabilities of certain events based on historical data to help answering business questions. Typically we want to know more about events in the future, however they can be as well be in the present or even in the past.

The term Predictive Analytics is used mostly in a business context, but the methods and technologies do not differ much from many other Analytics areas. Even self-driving cars use some kind of prediction e.g. to recognize traffic signs.

Also Predictive Analytics is not tightly bound to Artificial Intelligence and Machine Learning. It can also be done with classical statistics. However Machine Learning approaches are much more powerful if good datasets are available.

For simplicity, we define Predictive Analytics as business-oriented application of Machine Learning.

The importance of asking the right questions

Predictive Analytics ist not a magic system that will generate meaningful answers autonomously based on some data you put in.

As in everyday life, if you want useful answers, you have to ask the right questions.

Start identifying good questions for your Predictive Analytics system by answering questions like the ones below to yourself:

  1. What is our business problem?
  2. What is the desired outcome?
  3. What goal do we want to reach?
  4. Where in our value chain can we do something about it?
  5. Where do we expect the quickest results?

Try to be as specific as possible and keep in mind that you have to be able to measure the actions based on the answers later. Questions like “Will we outpace our competitor X?” will lead to nowhere, you have to break it down to something feasible. Also a Predictive Analytics system cannot invent answers out of the blue. The information must be somewhere hidden in the data, be it your internal data or data you take from outside. Remember Peter Norvig’s (Google Research Director) famous quote:

“We don’t have better algorithms. We just have more data.”

Let’s go to some basic examples.

Let’s say you are a retailer planning a sale and you want to predict the impact on the bottom line by asking “With all cost of the promotion, how likely are we going to make profit?”

Or you are a credit card company and want to predict your credit risk by asking “What is the probability that this new customer will be unable to pay his debts?”

How does Predictive Analytics work?

I try to give non-technical and non-statistical people insights about the basic mechanics of Predictive Analytics. If you want a deep understanding, there are plenty of other sources available, e.g. on Coursera.

Let’s say you have an online book store and want to program a mechanism similar to the Amazon’s famous “Customers bought also…”, which is commonly known as Collaborative Filtering, one of the more basic use cases of Predictive Analytics.

Given you have plenty of transaction data about customer purchases, it is fairly easy to create a method to identify books, that have been bought together by a customer. You can do that with Excel.

Now if a new customer buys a book, you can lookup on this data and you get maybe 500 recommendations. Argh! Which one should you take? Based on the data you have, every of these 500 books has the same purchase probability. So you take into account how many times books were bought together. Maybe now you have 100. Ok let’s add gender and country as criteria. Down to 50. But wait! Which of these criteria are the most relevant and what weight should I assign to them?

From a business perspective, you have the challenge to be sure that customers, you are expecting to buy upon a recommendation, actually buy – but not to miss customers that would actually buy, but were not expected to buy and therefore didn’t get a recommendation.

Doesn’t that sound like the marketing problem stated by John Wannamaker:

Half the money I spend on advertising is wasted; the trouble is I don’t know which half.

In science, this is known as Precision and Recall.

Even if you are a statistics guru, it will take you a long time (not speaking of very good business knowledge) to evaluate all criterias, uncover patterns and interdependencies, formulate hypotheses and test your algorithm in order to achieve the desired result.

That’s where Machine Learning comes into place.

Machine Learning finds answers that are hidden in your historical data by uncovering patterns or interdependencies and generates a predictive model out of it. But it has to be trained correctly with relevant data according to your question.

Basically, if your question is “Will a customer buy item X?”, you train your system with two sample datasets: One about customers that bought it, and one with customers that didn’t buy it. Your Machine Learning algorithm (in this case some sort of classification) will find the relevant indicators that point to a purchase by comparing the datasets.

Once you have a new purchase of item Y, you can check the transaction and customer data against this model and it will give you a prediction for a customer’s purchase of item X with a given probability. All you have to do, is to show the customer your recommendation, measure the outcome and take this as an input again, as you want to get constantly better.

Of course, this is very very simplistically speaking. There are a lot of things to consider and lots of different techniques and algorithms that are much more sophisticated. And such a process will always be iterative with steady evaluations, tests and optimizations. But the core idea stays the same.