DoctorofMadness
New Member
- Joined
- Mar 13, 2023
- Messages
- 6
- Office Version
- 365
- Platform
- Windows
I'm a college student working on a report currently and am having an issue with a problem I've recently run into and it's really throwing me for a loop. I thought I found a solution in this post, but couldn't get it to output anything but 0.
[[ Count how many times in total two values appear together in a row for all the rows ]]
To begin the project I was given a dataset of a little under 18000 transactions, with 19 different items that may have been purchased. The goal of the project is to draw what item types have associations with each other; for instance, if someone buys product 17, what is the confidence we can have that they would also purchase product 4 in the same transaction. The way I have the sheet looks as follows (first 20 transactions)
On the right side of the spreadsheet, the data is just pulled out of the Item_Types column using Textsplit() to remove the commas.
I thought I was onto something with the thread I linked above but cannot get it to output anything except 0, but I also have two theories onto why it doesn't work; each transaction might see the same item precedent or antecedent ( first item or second item ) more than once (see Transaction_ID 19, #5 appears twice).
I also decided to write the formula with 100 columns in the mmult() part, since I thought the large amount of blank space wouldn't affect anything since the way I am interpreting it to work.
=SUM((MMULT(--('Transaction IDs + Item Types'!D2:CY17918=L3),{1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1})>0)
*(MMULT(--('Transaction IDs + Item Types'!D2:CY17918=O3),{1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1})>0))
Let me know if you have any ideas. I am trying to figure a way to write a formula that will check if a row contains two values I am looking for anywhere in the row, while allowing the same number(s) to potentially appear more than once. I also know there are different ways this could be done, one of which using powerpivot, others using RStudio, I just haven't used those tools before and would rather try and do it on software I understand before resorting to learning new tools.
[[ Count how many times in total two values appear together in a row for all the rows ]]
To begin the project I was given a dataset of a little under 18000 transactions, with 19 different items that may have been purchased. The goal of the project is to draw what item types have associations with each other; for instance, if someone buys product 17, what is the confidence we can have that they would also purchase product 4 in the same transaction. The way I have the sheet looks as follows (first 20 transactions)
Transaction_ID | Customer_ID | Item_Types | Item Types | |||||||
1 | 15107144 | 17,8,5,4, | 17 | 8 | 5 | 4 | ||||
2 | 15107169 | 17,9,8, | 17 | 9 | 8 | |||||
3 | 15120097 | 13,8, | 13 | 8 | ||||||
4 | 15128454 | 9,9,8, | 9 | 9 | 8 | |||||
5 | 15128488 | 10,10,7, | 10 | 10 | 7 | |||||
6 | 15131912 | 2, | 2 | |||||||
7 | 15134734 | 3,2,2, | 3 | 2 | 2 | |||||
8 | 15500173 | 13,10,8,7,5,2,2,1, | 13 | 10 | 8 | 7 | 5 | 2 | 2 | 1 |
9 | 15502484 | 5, | 5 | |||||||
10 | 15507087 | 7, | 7 | |||||||
11 | 15508887 | 8, | 8 | |||||||
12 | 15510149 | 12, | 12 | |||||||
13 | 15513135 | 12, | 12 | |||||||
14 | 15514612 | 13, | 13 | |||||||
15 | 15518225 | 13,10,1, | 13 | 10 | 1 | |||||
16 | 15518985 | 8,8,3, | 8 | 8 | 3 | |||||
17 | 15520494 | 13,5, | 13 | 5 | ||||||
18 | 15523811 | 17, | 17 | |||||||
19 | 15524504 | 14,11,10,5,5,2,1, | 14 | 11 | 10 | 5 | 5 | 2 | 1 | |
20 | 15529982 | 12, | 12 |
On the right side of the spreadsheet, the data is just pulled out of the Item_Types column using Textsplit() to remove the commas.
I thought I was onto something with the thread I linked above but cannot get it to output anything except 0, but I also have two theories onto why it doesn't work; each transaction might see the same item precedent or antecedent ( first item or second item ) more than once (see Transaction_ID 19, #5 appears twice).
I also decided to write the formula with 100 columns in the mmult() part, since I thought the large amount of blank space wouldn't affect anything since the way I am interpreting it to work.
=SUM((MMULT(--('Transaction IDs + Item Types'!D2:CY17918=L3),{1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1})>0)
*(MMULT(--('Transaction IDs + Item Types'!D2:CY17918=O3),{1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1})>0))
Let me know if you have any ideas. I am trying to figure a way to write a formula that will check if a row contains two values I am looking for anywhere in the row, while allowing the same number(s) to potentially appear more than once. I also know there are different ways this could be done, one of which using powerpivot, others using RStudio, I just haven't used those tools before and would rather try and do it on software I understand before resorting to learning new tools.