From Shopify to Excel: cohort analysis, last step missing

WillyBill

New Member
Joined
Aug 8, 2017
Messages
5
Hi,

A quick one. I have Shopify, and get the following raw data: customer ID, date of purchase and amount (€)

I would like to perform a cohort analysis

[TABLE="width: 369"]
<colgroup><col><col><col><col></colgroup><tbody>[TR]
[TD="align: center"]Client ID[/TD]
[TD="align: center"]Date [/TD]
[TD="align: center"]Amount[/TD]
[TD="align: center"]Date of 1st purchase[/TD]
[/TR]
[TR]
[TD="align: center"]68[/TD]
[TD="align: center"]Jan-15[/TD]
[TD="align: center"] € 7,979[/TD]
[TD="align: center"]Jan-15[/TD]
[/TR]
[TR]
[TD="align: center"]13[/TD]
[TD="align: center"]Jan-15[/TD]
[TD="align: center"] € 32,836[/TD]
[TD="align: center"]Jan-15[/TD]
[/TR]
[TR]
[TD="align: center"]79[/TD]
[TD="align: center"]Jan-15[/TD]
[TD="align: center"] € 1,780[/TD]
[TD="align: center"]Jan-15[/TD]
[/TR]
[TR]
[TD="align: center"]13[/TD]
[TD="align: center"]Jan-15[/TD]
[TD="align: center"] € 30,024[/TD]
[TD="align: center"]Jan-15[/TD]
[/TR]
[TR]
[TD="align: center"]79[/TD]
[TD="align: center"]Jan-15[/TD]
[TD="align: center"] € 3,000[/TD]
[TD="align: center"]Jan-15[/TD]
[/TR]
</tbody>[/TABLE]

First step, I created another column to get the date of 1st purchase per row using: =MIN(IF($A$2:$A$16288=A2,$B$2:$B$16288))
A being client ID
B being Date of purchase

Second step, I counted the number of unique values (for client IDs) and I used =SUM(--(FREQUENCY(A2:A6,A2:A6)>0))

Third step (where I am stuck), I would like to have a formula translating:
> If date of 1st purchase is Jan-15 and if Date (of purchase) is Jan-15 (variable, could be Feb), then count unique values.

However I struggle to find the formula. The idea is to know the number of customers in Jan-15. Then how many remains in Feb-15 and so on.

Any help would be appreciated, happy to share the excel if that is easier.

Best
 

Excel Facts

Is there a shortcut key for strikethrough?
Ctrl+S is used for Save. Ctrl+5 is used for Strikethrough. Why Ctrl+5? When you use hashmarks to count |||| is 4, strike through to mean 5.
I'm not sure what you mean for step 3. Just explain in plain english what information you are trying to understand. So if someone makes a purchase on January 15th and it was their first purchase... what do you want to know then?
 
Upvote 0
You make a cluster for all new customers for a given month, in the example above it is for the new customers of January 2015.

The customer base in January 2015 is 100%. Then I want to know for the next months how many of these new clients from January remain.

Most likely, the number will decrease as they might not buy something every month

In other words, it helps to measure the retention and also the revenues for a specific group of customers.

However, i cannot manage to find the formula that does that from the raw data given in my 1st post.

I added a new column "date of 1st purchase" to isolate the new customers for a given month. Then, within the customer base of January, I would like to know how many of them are still customers in Feb, March etc

I hope this clarifies

Best & thanks

The end result should look like this

[TABLE="width: 417"]
<colgroup><col><col span="5"></colgroup><tbody>[TR]
[TD][/TD]
[TD="align: right"]Month 0[/TD]
[TD="align: right"]Month 1[/TD]
[TD="align: right"]Month 2[/TD]
[TD="align: right"]Month 3[/TD]
[TD="align: right"]Month 4[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]2015-1[/TD]
[TD]2015-2[/TD]
[TD]2015-3[/TD]
[TD]2015-4[/TD]
[TD]2015-5[/TD]
[/TR]
[TR]
[TD]2015-1[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]2[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]2[/TD]
[/TR]
[TR]
[TD]2015-2[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]12[/TD]
[TD="align: right"]7[/TD]
[TD="align: right"]8[/TD]
[TD="align: right"]5[/TD]
[/TR]
[TR]
[TD]2015-3[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]9[/TD]
[TD="align: right"]6[/TD]
[TD="align: right"]5[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Yeah I pretty much follow now. In your example table though, how would month 0 have a 0 value, meaning no first purchases that month and then the following months have positive values? For the second row, how can month 0 have no purchases but then you list 12 after?

I think I follow what you are saying and I would suggest using VBA if possible as you are doing multiple lookups and need to keep track of looked up values. It is easier to code it in vba than to make a complicated formula. If you can use vba, then you can loop through each client and check their first purchase and compare to followup purchases. Just way easier when you can declare variables and store information to variables.
 
Last edited:
Upvote 0
I see thanks for your reply.

I never used VBA, but I guess it is easier, I will try and may post again if further issues.

Thanks
Cheers
 
Upvote 0
See if this example helps


[TABLE="class: grid"]
<tbody>[TR]
[TD="bgcolor: #DCE6F1"][/TD]
[TD="bgcolor: #DCE6F1"]
A
[/TD]
[TD="bgcolor: #DCE6F1"]
B
[/TD]
[TD="bgcolor: #DCE6F1"]
C
[/TD]
[TD="bgcolor: #DCE6F1"]
D
[/TD]
[TD="bgcolor: #DCE6F1"]
E
[/TD]
[TD="bgcolor: #DCE6F1"]
F
[/TD]
[TD="bgcolor: #DCE6F1"]
G
[/TD]
[TD="bgcolor: #DCE6F1"]
H
[/TD]
[TD="bgcolor: #DCE6F1"]
I
[/TD]
[TD="bgcolor: #DCE6F1"]
J
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
1
[/TD]
[TD]
Client ID​
[/TD]
[TD]
Month of purchase​
[/TD]
[TD]
Amount​
[/TD]
[TD]
Month of 1St purchase​
[/TD]
[TD][/TD]
[TD][/TD]
[TD]
Month 0​
[/TD]
[TD]
Month 1​
[/TD]
[TD]
Month 2​
[/TD]
[TD]
Month 3​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
2
[/TD]
[TD]
68​
[/TD]
[TD]
1​
[/TD]
[TD]
7979,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD="bgcolor: #BFBFBF"]
1st Purchase​
[/TD]
[TD="bgcolor: #DCE6F1"]
1​
[/TD]
[TD="bgcolor: #DCE6F1"]
2​
[/TD]
[TD="bgcolor: #DCE6F1"]
3​
[/TD]
[TD="bgcolor: #DCE6F1"]
4​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
3
[/TD]
[TD]
13​
[/TD]
[TD]
1​
[/TD]
[TD]
32836,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD="bgcolor: #BFBFBF"]
1​
[/TD]
[TD]
3​
[/TD]
[TD]
2​
[/TD]
[TD]
2​
[/TD]
[TD]
1​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
4
[/TD]
[TD]
79​
[/TD]
[TD]
1​
[/TD]
[TD]
178,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD="bgcolor: #BFBFBF"]
2​
[/TD]
[TD]
0​
[/TD]
[TD]
3​
[/TD]
[TD]
1​
[/TD]
[TD]
2​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
5
[/TD]
[TD]
13​
[/TD]
[TD]
1​
[/TD]
[TD]
30024,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD="bgcolor: #BFBFBF"]
3​
[/TD]
[TD]
0​
[/TD]
[TD]
0​
[/TD]
[TD]
4​
[/TD]
[TD]
2​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
6
[/TD]
[TD]
79​
[/TD]
[TD]
1​
[/TD]
[TD]
3,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD="bgcolor: #BFBFBF"]
4​
[/TD]
[TD]
0​
[/TD]
[TD]
0​
[/TD]
[TD]
0​
[/TD]
[TD]
2​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
7
[/TD]
[TD]
44​
[/TD]
[TD]
2​
[/TD]
[TD]
1000,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
8
[/TD]
[TD]
22​
[/TD]
[TD]
2​
[/TD]
[TD]
1000,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
9
[/TD]
[TD]
13​
[/TD]
[TD]
2​
[/TD]
[TD]
500,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
10
[/TD]
[TD]
79​
[/TD]
[TD]
2​
[/TD]
[TD]
400,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
11
[/TD]
[TD]
21​
[/TD]
[TD]
2​
[/TD]
[TD]
200,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
12
[/TD]
[TD]
12​
[/TD]
[TD]
3​
[/TD]
[TD]
110,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
13
[/TD]
[TD]
22​
[/TD]
[TD]
3​
[/TD]
[TD]
120,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
14
[/TD]
[TD]
16​
[/TD]
[TD]
3​
[/TD]
[TD]
100,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
15
[/TD]
[TD]
10​
[/TD]
[TD]
3​
[/TD]
[TD]
100,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
16
[/TD]
[TD]
79​
[/TD]
[TD]
3​
[/TD]
[TD]
130,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
17
[/TD]
[TD]
18​
[/TD]
[TD]
3​
[/TD]
[TD]
150,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
18
[/TD]
[TD]
13​
[/TD]
[TD]
3​
[/TD]
[TD]
120,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
19
[/TD]
[TD]
13​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
20
[/TD]
[TD]
22​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
21
[/TD]
[TD]
44​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
2​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
22
[/TD]
[TD]
16​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
23
[/TD]
[TD]
9​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
4​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
24
[/TD]
[TD]
18​
[/TD]
[TD]
4​
[/TD]
[TD]
50,00​
[/TD]
[TD]
3​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
25
[/TD]
[TD]
20​
[/TD]
[TD]
4​
[/TD]
[TD]
120,00​
[/TD]
[TD]
4​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
26
[/TD]
[TD]
13​
[/TD]
[TD]
4​
[/TD]
[TD]
220,00​
[/TD]
[TD]
1​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
27
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

Array formula in D2 copied down
=MIN(IF(A$2:A$100=A2,B$2:B$100))
Ctrl+Shift+Enter

Array formula in G3 copied across and down
=SUM(IF(FREQUENCY(IF($D$2:$D$100=$F3,IF($B$2:$B$100=G$2,$A$2:$A$100)),$A$2:$A$100),1))
Ctrl+Shift+Enter

M.
 
Upvote 0

Forum statistics

Threads
1,223,903
Messages
6,175,284
Members
452,630
Latest member
OdubiYouth

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top