Tales of Beers and Diapers
March 2, 2011 at 3:37 pm Terence Craig 8 comments
There is an apocryphal story that is often told to illustrate data mining concepts. The story is about beer and diaper sales and usually goes along the lines of:
|INSERT MAJOR RETAILER NAME| found on |INSERT DAY OF THE WEEK| that beer and diaper sales were strongly correlated. Once noticed on |INSERT BI TOOL OF CHOICE|, it was found |PICK ONE|:
- That diapers are too heavy for recently pregnant women so they ask their husbands to pick them up coming home from work and since hubby is off the clock and ready to get his drink on, he also picks up beer.
- That a diaper emergency occurs fairly late in the evening and the husband is sent out while the new mother cares for the baby. Being annoyed, he also picks up a 12 pack to relax.
- That |INSERT SOME EQUALLY GROSS STEREOTYPICAL ASSUMPTION ABOUT THE U.S. WORKING CLASS PARENT|.
The brilliant analyst at |SAME MAJOR RETAILER AS ABOVE| intuits that a simple relocation of beer next to diapers will lead to more purchases of beer and beer sales improve by |INSERT HIGHER %|.
It’s a great story even though it is almost certainly an urban legend, if for no other reason that Wal-Mart is usually the big retailer chain mentioned. Having direct experience with Wal-Mart and how secretive they are, if the chain was Wal-Mart, trust me, you would not have heard about it.
And while stories like this have been used to sell lots of BI and Data Mining licenses, the fact of the matter is that performing correlation with large data sets in a performant manner and without a highly trained statistician is no fun at all with most tools and is incredibly underused because of that fact.
This is a shame because the power of correlation to confirm or, ideally, discover relationships that can positively affect a business is hard to overstate. We have used our correlation engine on such diverse things as determining optimum restocking times for multi-national produce retailers to finding the relations between social media campaigns and actual sales. For this power to be broadly used, we need to make the correlation tools available both easier and faster.
When we were redesigning our correlation module in PAF, we had these goals:
- It had to be fast over large data sets.
- It had to be easy enough to use that an AVERAGE user could easily do exploratory data analysis (EDA) without knowing what it is. Is the new Twitter campaign driving sales in the Southern region of pink cell phones?
- It had to support time shifted correlations that allowed the discovery of correlations where there is a significant time gap in the relationship. Does an ad buy only impact sales 2 months after a campaign is started?
- The results of the correlation should be easy to understand.
- That it would lay the foundation for the holy grail – automatic EDA. The system tells users automatically what relationships are worth paying attention to.
For complex hypotheses testing and for avoiding the correlation is not causation trap, you will still need folks with a strong statistics foundation. But having sophisticated EDA capabilities in the reach of anybody who wants to easily test their gut observations will empower workers throughout an organization and will enable a lot of the art of a business (aka known as hunches) to become empirically-based science.
We will post have posted a demo video of our correlation engine in action shortly – let us know your thoughts. For those of you who are interested, our default correlation algorithm is Pearson’s.
Entry filed under: Data, General Analytics, General Business, Technology. Tags: analytics, big data, correlation, data mining, PatternBuilders Analytic Framework, retail analytics, statistics, Wal-Mart.
1.
PAF Correlation « Big Data Big Analytics | March 3, 2011 at 3:02 pm
[…] last post was about correlation – a powerful tool that requires an easy-to-use UI to be effective along […]
LikeLike
2.
Roundup: About 4 Tech Giants, All Things Private, Social Media Stats, Maps, and Big Data! « Big Data Big Analytics | October 20, 2011 at 7:48 am
[…] Beers and Diapers – Correlation […]
LikeLike
3. Big Data, Beer & Diapers: Exploring the 1990s Myth in 2012 | Shopper 360 | July 11, 2012 at 12:14 am
[…] explores the legend in 1998 also here Urban Legend Not true The promise of big data Let the data speak It’s complicated but yes, Virginia… […]
LikeLike
4.
ビッグデータは本当に使えるのか | うずまき2nd | December 18, 2012 at 8:38 am
[…] The brilliant analyst at |SAME MAJOR RETAILER AS ABOVE| intuits that a simple relocation of beer next to diapers will lead to more purchases of beer and beer sales improve by |INSERT HIGHER %|. Tales of Beers and Diapers […]
LikeLike
5. 推荐系统算法系列外一篇:关联规则挖掘 | lc42 | June 2, 2013 at 3:48 am
[…] 关联规则挖掘(Association rule mining)是一种用于“在大量的交易事务中,辨别出一些关系模式”的通用方法。最著名的例子莫过于传说中的“啤酒与尿布”(儿童节聊尿布真是应景……),这个例子简单来说是,很多家庭出来买尿布的时候,男方都会顺便拿个啤酒回去,这个规则是通过研究超市(沃尔玛干的)的购物清单得到的。于是超市摆货架的时候,可以有意把一些啤酒放在尿布附近,可以有效提高顾客的消费额。当然前面的理由只是先发现规律后在自己YY的理由而已,其实就是沃尔玛发现了啤酒跟尿布存在一种比较强的关系。即是说,超市方从大量的购物清单中得出一个数据“如果一个顾客买了一些婴儿用品(食物尿布等等),有60%(数字瞎掰)的概率也买了啤酒”,那么他们就可以开始摆货架了。 […]
LikeLike
6. Big Data– על מאגרי מידע גדולים, והקשר בין בירות לחיתולים | הבלוג של רם קדם | June 5, 2014 at 10:22 pm
[…] אורבנית מספרת על חברת wallmart אשר הגיעה למסקנה מעניינת נוספת – בשעות הערב המאוחרות קיימת קורלציה בין קניית […]
LikeLike
7. Big Data– על מאגרי מידע גדולים, והקשר בין בירות לחיתולים — הבלוג של רם קדם | September 6, 2014 at 6:05 am
[…] אורבנית מספרת על חברת wallmart אשר הגיעה למסקנה מעניינת נוספת – בשעות הערב המאוחרות קיימת קורלציה בין קניית […]
LikeLike
8. What is ETL? Overview of ETL Process, Tools, Use Cases - Data Integration Info | April 2, 2020 at 8:38 am
[…] There’s an interesting story that’s often mentioned when talking about the power of data. And it’s called the beer and diaper analogy. […]
LikeLike