Finding the most frequent combination

excelNewbie22

Well-known Member
Joined
Aug 4, 2021
Messages
534
Office Version
  1. 365
Platform
  1. Windows
hi!
how to find the most frequent combination of 1's and 0's (or just 1's), which bigger then 30,
could be 31 or 40, and etc....
it can skip columns, for instance, 30 and 39 and 40 to 42, or 1-9 and etc,
now, the data below is just a sample, the real data is way more big, like 2000 rows and 160 columns,
(which in this case i'll lookup to find combo which bigger then 120)
the possible combinations is soooooo biggggg, like hundres of millions,
i find it hard for excel to handle, but i thought maybe there's a clever way to approach it?

test
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARAS
1123456789101112131415161718192021222324252627282930313233343536373839404142434445
2111111111111111111111111111111111111111111111
3111111111111011111111111111111111111111111110
4111111111111111111111111111111111111111111111
5111111111111111111111111111111111110111111111
6111111111111111111111111111111111111111111111
7111111111111111111111111111111111111111111110
8111111111101111111111111110111111111111111111
9111111111111111111111111111111111111111111111
10111111111110111111111111111111111111111111111
11111111111111111110111111111111111011111111111
12111111111111111111111111111111111111111111111
13111111111101111111111111111111111110111111111
14111111111111111111111111111111111111111111111
15111111111101111111111111111111111111111111111
16111111111111111111111111111111111111111111111
17111111111111111111111111111111111111111111111
test
 
How about this?

Excel
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASATAUAV
1123456789101112131415161718192021222324252627282930313233343536373839404142434445CombinationFrequency
21111111111111111111111111111111111111111111111111111111111111111111111111111111118
3111111111111011111111111111111111111111111110111111111111011111111111111111111111
411111111111111111111111111111111111111111111111111111111101111111111111111111101
511111111111111111111111111111111111011111111111111111111111111111111111111111102
6111111111111111111111111111111111111111111111111111111101111111111111110111111112
7111111111111111111111111111111111111111111110111111111110111111111111111111111112
811111111110111111111111111011111111111111111111111111111111111011111111111111101
9111111111111111111111111111111111111111111111111111111111111110111111111111111111
10111111111110111111111111111111111111111111111111111111101111111111111111111111123
1111111111111111111011111111111111101111111111111111111110111111111111111111111101
12111111111111111111111111111111111111111111111
13111111111101111111111111111111111110111111111
14111111111111111111111111111111111111111111111
15111111111101111111111111111111111111111111111
16111111111111111111111111111111111111111111111
17111111111111111111111111111111111111111111111
Sheet1
Cells with Conditional Formatting
CellConditionCell FormatStop If True
A2:AG17Cell Value=0textNO
AU2:AV11Expression=MOD(ROW(),2)=0textYES


VBA Code:
Sub WCMBO()
Dim LR As Long:         LR = Range("A" & Rows.Count).End(xlUp).Row
Dim base() As Variant:  base = Range("A2:AG" & LR).Value
Dim inc() As Variant:   inc = Range("AH2:AS" & LR).Value
Dim SD As Object:       Set SD = CreateObject("Scripting.Dictionary")
Dim tmp As String, tv As String

For i = 1 To UBound(base)
    tmp = Join(Application.Index(base, i, 0), "")
    For j = 1 To UBound(inc, 2)
        tv = tmp & inc(i, j)
        SD(tv) = SD(tv) + 1
    Next j
Next i

Range("AU2").Resize(SD.Count).Value = Application.Transpose(SD.keys())
Range("AV2").Resize(SD.Count).Value = Application.Transpose(SD.items())
End Sub
 
Upvote 0

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
lrobbo314 -thank you for keep trying to help,
but i don't understand how did you get these combinations from this data set?
for example first combination in au2 is: 1111111111111110000000000000000000,
none of the rows have that much zero's,
and also the frequency, 118 times of the above combination doesn't sound right
 
Upvote 0
I don't know what you're talking about when you say "for example first combination in au2 is: 1111111111111110000000000000000000"

The value I posted in AU2 is all 1s.

The code goes row by row. Starting in row 2, it combines the first 33 values, e.g. A2:AG2, into a string. Then it goes one by one. Essentially, the first value will be A2:AG2 + AH2 combined into a string. Then A2:AG2 + AI2, A2:AG2 + AJ2, A2:AG2 + AK2, and so on.

As it is making all of these strings, it is adding them to a dictionary and tallying them.

The combinations from the results I posted have at most 2 zeroes in any individual combination.

EXCEL
AUAV
1CombinationFrequency
21111111111111111111111111111111111118
3111111111101111111111111111111111123
4111111111101111111111111110111111112
5111111111110111111111111111111111112
6111111111111011111111111111111111111
7111111111111111110111111111111111111
811111111111111111111111111111111102
911111111110111111111111111111111101
1011111111111101111111111111111111101
1111111111111111111011111111111111101
Sheet1
Cells with Conditional Formatting
CellConditionCell FormatStop If True
AU2:AV11Expression=MOD(ROW(),2)=0textYES
 
Upvote 0
i see, apparently i did something wrong, when i copied all to excel, it's showed me: 1.11111E+33, so i format it as a number and it gave me 1111111111111110000000000000000000, so apologies, and thank you (how to see it as the full number?)

i still didn't understand how you got to 118, when you say "Then it goes one by one." do you mean A2:AG2 and then A3:AG3 and A4:AG4 all the way down? if not, this is what i meant and after, and from this list of results (A2:AG2,A3:AG3,A4:AG4,A5:AG5,A6:AG6.........) i want to know if there's some combination which exceeds 10%
 
Upvote 0
I am trying to go off of the formulas that you posted in Post #8.

Let's pretend that what I am posting below is just one row worth of data. Each row, will have 12 different combinations. The first 33 characters are always the same. Then, it starts at column 34 and appends it to the end. Then, it keeps that initial base of the first 33 characters, but adds the value from column 35, and so on.

After it gets to column 45, it would move on the the next row, get a new base of the first 33 characters, and repeat the process.

Like I said, what is posted below is an illustration of the steps for just 1 row. This was my understanding of how the combinations are being made.

Book1
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARASATAU
1123456789101112131415161718192021222324252627282930313233343536373839404142434445Combinations
21111111111111011111111111111111111101111111111111111111111011111111111111111111
31111111111111011111111111111111111101111111111111111111111011111111111111111111
41111111111111011111111111111111111101111111111111111111111011111111111111111110
51111111111111011111111111111111111101111111111111111111111011111111111111111111
61111111111111011111111111111111111101111111111111111111111011111111111111111111
71111111111111011111111111111111111101111111111111111111111011111111111111111111
81111111111111011111111111111111111101111111111111111111111011111111111111111111
91111111111111011111111111111111111101111111111111111111111011111111111111111111
101111111111111011111111111111111111101111111111111111111111011111111111111111111
111111111111111011111111111111111111101111111111111111111111011111111111111111111
121111111111111011111111111111111111101111111111111111111111011111111111111111111
131111111111111011111111111111111111101111111111111111111111011111111111111111111
Sheet1
Cell Formulas
RangeFormula
AU2:AU13AU2=TEXTJOIN("",,A2:AG2,INDEX($AH$2:$AS$13,ROW()-1,ROW()-1))
 
Upvote 0
i understand what you say, but "Each row, will have 12 different combinations" isn't what i meant by the example in post #8,
the number of combinations is so much bigger, millions bigger, because the combinations could also be a2:d2 & f2:ah2 = 34,
not just the first 33 & each one of the remaining,
or a2 & d2 & f2 & g2:aj2 =34 and so on

btw, for calculating and getting all possible combinations i often use this site Combinations and Permutations Calculator and for 34/41 the combinations are 22,481,940 !

and like you say, this is just for one row
 
Upvote 0
Ok. I'm starting to better understand what you're going after.

But, I'm still a little confused. You said 34/41 for 22,481,940 combinations. But there are 45 columns. Shouldn't the combination calculation be 34/45 which would be 1,0150,595,910 combinations per row?

Even if you're right and it's 22 million... 22 million x 2,000 rows is 44,963,880,000 different combinations! Or, if my calculation of 34/45 is correct... about 20 trillion combinations. What could possibly be the reason to compare that many combinations?
 
Upvote 0
the 45 columns was just an example, the real data is 41,
(actually have two sets of data, one 41 and one 150 like i mention in post#1)
in my data each column represent a data and i'm trying to find similarity for conclusions,
but no matter how big the combinations are,
if possible by excel i would glad to try it out
 
Upvote 0
Unfortunately, I think this is an unrealistic request. Excel CANNOT handle it. The VBA code was exceedingly slow.

So, I wrote some code in Python. It looked promising. But, I ran 16 rows, with the 34/41 ratio. And that was like 2.5 hours ago, and it's still going. And it's taxing the S#!& out of my computer.

I'll let you know when it finishes. But this doesn't seem realistic. People say Python is slow. So maybe doing it in Java or some variant of C would be feasible.

But, I don't really know Java or C.

Think I've hit the limit of being able to help on this one.
 
Upvote 0
Now, maybe, you can see the reason for my post #9.

I don't see what the goal is, what purpose could it possibly serve? To test what coders/Excel can do? I don't know.

Maybe we should try to find out what the last digit of 22/7 is with no remainder?
 
Upvote 0

Forum statistics

Threads
1,224,818
Messages
6,181,152
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top