Hello Everyone!
I'm about to join a company that sells retail products through its stores. Hence, they should have data available on their customers shopping quite easily.
For Example:
The billing data shows that a customer buys soap, shampoo and potatoes; another buys shampoo, soap and tomatoes; and yet another buys shampoo, body-wash and a hammer. From this data we could say that there is a high possibility that a customer buys a shampoo and a body cleansing product each time; furthermore the probability that that cleansing product is a bar of soap is 66% and that it may be a body-wash is 33%.
Ofcourse the data above is simplistic, but this is exactly what I aim to do with the products that this company sells. I know that I can further add a complication like seasonality to this.
My question is, How do I measure the co-variance for this?
For stocks this is easy, you can measure a stock keeping the market as a standard. But in this case I need to form product pairs and then see their commonalities.
I expect the data will be under the following columns:
Bill no. > Date > Product name > Type > Brand > Sub-brand > Price > Discount
Ofcourse there could be more data that is given, however for this process i believe this is all the data that I'll need. I plan to show this data on an interactive graph with slicers that allow a user to choose the brands, sub-brands, products in a set (not just a pair). If no Product set is chosen, then any one product can be chosen and graph should give the range of products with the highest correlation in purchases.
Any advice on how to proceed with this analysis would be much appreciated.
Thanks.
I'm about to join a company that sells retail products through its stores. Hence, they should have data available on their customers shopping quite easily.
For Example:
The billing data shows that a customer buys soap, shampoo and potatoes; another buys shampoo, soap and tomatoes; and yet another buys shampoo, body-wash and a hammer. From this data we could say that there is a high possibility that a customer buys a shampoo and a body cleansing product each time; furthermore the probability that that cleansing product is a bar of soap is 66% and that it may be a body-wash is 33%.
Ofcourse the data above is simplistic, but this is exactly what I aim to do with the products that this company sells. I know that I can further add a complication like seasonality to this.
My question is, How do I measure the co-variance for this?
For stocks this is easy, you can measure a stock keeping the market as a standard. But in this case I need to form product pairs and then see their commonalities.
I expect the data will be under the following columns:
Bill no. > Date > Product name > Type > Brand > Sub-brand > Price > Discount
Ofcourse there could be more data that is given, however for this process i believe this is all the data that I'll need. I plan to show this data on an interactive graph with slicers that allow a user to choose the brands, sub-brands, products in a set (not just a pair). If no Product set is chosen, then any one product can be chosen and graph should give the range of products with the highest correlation in purchases.
Any advice on how to proceed with this analysis would be much appreciated.
Thanks.