Market Basket Analysis 101: Anticipating Customer Behavior

In the business intelligence world, “market basket analysis” helps retailers better understand – and ultimately serve – their users by predicting their purchasing behaviors. In this blog post, I explain how market basket analysis works and what it takes to deploy a market analysis project.

Ever wondered how Amazon knows what items to suggest to you before and after you make a purchase? The trick is an aspect of affinity analysis created specifically to promote sales called market basket analysis (MBA).  In the retail industry, MBA refers to an unsupervised data mining technique that discovers co-occurrence relationships among customers’ purchase activities. The technique is based on the theory that if you buy a certain group of items, you are likely to buy another group of items. For example: while at McDonald’s, if you buy sandwiches and cookies, you are more likely to buy a drink than someone who did not buy a sandwich.

The application of market basket analysis in retailing is based on the notion that “most customers make purchases either on impulse or because there is natural affinity between the items they buy,” and that there are other items a customer would likely purchase when buying one item, had they considered it. Market basket analysis helps retailers discover those other items. Think of it this way: there is a very high chance of buying an HDMI cable if you just bought a television, especially when offered the suggestion that an HDMI cable compliments your new TV. If you have bought a TV at Best Buy before, that should sound familiar. The volume of sales made from user clicks on Amazon’s “Customers who bought this product also bought these products…” call to action links is a testament to the effect and importance of market basket analysis.

Market Basket Analysis, Explained

MBA aims to find relationships and establish patterns across purchases. The relationship is modeled in the form of a conditional algorithm:

IF {sandwich, cookies} THEN {drink}

In the shorthand notation, which translates to “the items on the right are likely to be ordered with the items on the left:”

A collection of items purchased by a customer is an itemset. The set of items on the left hand side of the arrow symbol (sandwich, cookies in the example above) is the antecedent of the rule while the one to the right (drink in the above case) is the consequent.  The probability that the antecedent event will occur, i.e., a customer will buy a sandwich and cookies, is the support of the rule. That simply refers to the relative frequency that an itemset appears in transactions. In a retail outlet, the support of an item or item bundle helps in identifying drivers of traffic to the outlet. Hence, if a sandwich and cookies have high support, then they can be attractively priced to attract traffic to the store.

According to market basket analysis, a customer that orders this meal would be more likely to order a drink.

The probability that a customer will purchase a drink on the condition of purchasing a sandwich and cookies – or in Market Basket Analysis terminology, that the consequent will be ordered with the antecedent – is referred to as the confidence of the rule. In statistics, confidence of A&C is determined by dividing the support of A&C by the support of A. Confidence can be used for product placement strategy and increasing profitability. Placing high margin items near high confidence associated (driver) items can increase margin on purchases.

The lift of the rule is the ratio of the support of the left hand side of the rule (sandwich, cookies) co-occurring with right hand side (drink), divided by the probability that the left-hand side and right-hand side co-occur if the two are independent. Statistically, it is the quotient of the confidence of A&C over the support of C. Let us put that in context: lift that is greater than 1 suggests the presence of the antecedent increases the chances that the consequent will occur in a given transaction. Lift below 1 indicates that purchasing the antecedent reduces the chances of purchasing the consequent in the same transaction. When the lift is 1, then purchasing the antecedent makes no difference on the chances of purchasing the consequent.

Market basket analysts search for rules with lift that are greater than 1 backed with high confidence values and often, high support.

Retailers use the information from market basket analysis in a variety of ways:

  • Store layout: retailers will place products that co-occur together in the analysis in close proximity on the store floor to improve the shopping experience of the customer
  • Cross-selling or Up-selling: retailers will market extra products to the customer based on prior purchase behavior patterns or what is currently in their cart (this is what Amazon does).
  • Placement of items on a website or products in catalogs

Market Basket Analysis Algorithms

Algorithms for generating association rules include Apriori, FP-Growth, and Eclat among others. Many tools implement these algorithms: R, Spotfire TERR, Netezza, MicroStrategy are four such tools used by data scientists. The Apriori algorithm is a common method that systematically identifies itemsets that occur frequently with a support greater than a pre-defined value and calculates the confidence of all possible rules based on the frequent itemsets, keeping only those with confidence greater than a pre-defined value. Apriori is implemented in the arules package, which can be installed and run in R or Spotfire’s TERR environment. Data is fed into the rule engine typically in the following format:

The first column is the order/transaction number and the second is the item name or, more often, the item ID. The next steps usually involves aggregating each transaction across records into a single record as an array and converting the dataset to an R transaction. The result of that aggregation is as shown below:

Finally, the Apriori rule is applied to the transaction with a “resultset” that appears thus (where LHS and RHS represent the left-hand side and right-hand side items of each rule, respectively):

Deploying a Market Basket Analysis Project

Capturing, warehousing, and utilizing historical data is important for effective market basket analysis implementation. Critical success factors for the deployment of an MBA project in an organization include:

  • Support from top management
  • Group process, team work and collaboration – especially between business users and implementation team members
  • Data availability – means and methods for gathering and storing data
  • Data quality – format, accuracy and reliability of input
  • System processing power – timeliness of output
  • People – resources to implement the project and interpret results
  • Tools availability – association rule detection tools like R are important in order for the team not to re-invent the wheel of building new algorithms except necessary

Despite its popularity as a retailer’s computational technique, Market Basket Analysis, in the broader context of Affinity Analysis, is applicable in many other areas with increasing usage:

  • Manufacturing Sector: predictive analysis of equipment failure
  • Pharmaceutical Industry: discovery of co-occurrence relationships among diagnosis and pharmaceutical active ingredients prescribed to different patient groups
  • Bioinformatics: pre-processing protein interaction networks for predicting protein functions
  • Financial Criminology: fraud detection based on credit card usage data
  • Customer Behavior: associating purchase with demographic and socio-economic data (such as age, gender and preference).

More and more organizations are discovering ways of using market basket analysis to gain useful insights into associations and hidden relationships. As industry leaders continue to explore the technique’s ability, a predictive version of market basket analysis is making in-roads across many sectors in an effort to identify sequential purchases.

There’s more to explore at!

Sign up to be notified when we publish articles, news, videos and more!