Market Basket Analysis in R and Microsoft Azure
Market Basket Analysis can help retailers identify what consumer trends are and allows them to predict their behavior. In this article, we’ll discuss how we used this technique and applied it to the restaurant industry using Microsoft Azure.
Market Basket Analysis (MBA) is widely used by retail and other industries to derive associations between products through analyzing the frequency in which they are bought together. The data is used to analyze trends and predict what customers are inclined to buy or like. From there, MBA can be used to improve their marketing strategy and focus further on specific customer needs.
In our recent hackathon, we took up a business case to see how we can apply Market Basket Analysis to predict patterns for a client in the restaurant industry with Microsoft Azure. They are looking to offer promotions to their regular customers not only as a reward, but also to introduce options. A few parameters for consideration were:
Identification of loyal customers
Find associations using Market Basket Analysis
Determine if the customers are buying the ‘extras’ that would be predicted
If not, offer a special deal or coupon
Check back to see if they converted
To read more about MBA, or to get an introduction to statistical terms used below, visit our blog post!
Smartbridge has already developed Market Basket Analysis capabilities using Tibco Spotfire. We wanted to expand and develop our technical skills using newer and more flexible techniques with R and Microsoft Azure, including Machine Learning Studio and Databricks. In addition, we also wanted to explore visualizations of the statistical output using Power BI and Microstrategy.
The initial step was to get the data set required for the analysis and to setup the environment. For experimental purposes and to keep it general, we decided to start with a sample grocery data set.
We built the code in R to transform our data into a market basket format. There are 2 types of format, basket and item. We used the basket format where each line in the data file represents the transaction without the need to have a transaction number. When using item format, we would need to group the rows for transactions.
Below are the steps for generating the association rules in R.
Install the aRules library
Create a dataset called mba_arules using the apriori function
Set the parameters of min support of 0.001, min confidence of 0.75
Set maximum length (number of items in one list) of 3. It was easier to explain output that contained fewer choices
Snapshot of the association rules with R
Display the top 10 association rules
Display association rules
The first rule in the screenshot shows that a customer is 11 times as likely to buy bottled beer if he/she purchases liquor and red/blush wine. The confidence level of this rule was 90%.
We also used Azure Databricks to run the R program. Below are the steps:
Create an Azure Databricks resource and a cluster
Create a notebook with R language and associate the cluster created in the previous step
Install aRules library from the CRAN library source
Execute the R script
Now that we have the association rules built for the grocery data, we are planning to apply this to restaurant data. Check out our follow up blog post that gives an insight on the output using trends from Power BI as well as visualizations from MicroStrategy dossier.
There’s more to explore at Smartbridge.com!
Sign up to be notified when we publish articles, news, videos and more!