One for the mathematicians - finding combinations?

Binraider

Board Regular
Joined
Apr 26, 2005
Messages
79
Hi all,

Let's say we have 5 numbers: 1, -2, -3, -4 & 3. Let's also have a target figure of 1.

If we limit ourselves to summing only two of the first 5 numbers, there are only two combinations that can give 1.

There are probably more solutions if you allow the use of more than 2 of the selection.

What I'd like to do is automatically hunt for such solutions, in a large block of data. To fully implement what I have in mind, I'm looking at finding all possible combinations in approx 200 columns of data, across a long series of days.

The idea is that if certain columns appear repeatedly as possible results over a long series of days, then the subject of those columns could be regarded as suspect and worthy of further investigation.

Where this becomes mind boggling is the number of potential combinations for such numbers of columns - I believe I'm hunting for values out of 200 factorial possibles.

The other thing is the real data set I'm working with includes decimal figures, so I'd probably have to have some sort of rule to allow for this. Perhaps allowing the target figure to be +- 0.05 and look for combinations in the data that match the range, rather than the exact figure.

This sounds like something a clever mathematician might be able to cobble up. Any suggestions? :-)

In terms of tools I have a couple of options on what I can run this in - my office machines have 32-bit Excel & Access 2003, although I could run this in 64-bit Excel/Access 2010 at home. Possibly a good idea given I have 12GB of RAM installed on my home machine, as opposed to 2GB on the work one.

Dave
 
Really fun question, I was somewhat stumped but I have a solution below that works for sums of pairs of values. Maybe someone can build off of it though.

Code:
Sub SolvingCombinations()

Dim Number_Set As Variant
Dim Length_of_Number_Set As Long
Dim Size_of_Groups As Long
Dim Target_Figure As Long
Dim Number_Of_Satisfactions As Long
Dim DynamicArray1() As Long
Dim DynamicArray2() As Long
Dim SolutionArray() As String
Dim strBuf As String
Dim intIndex As Integer

Number_Set = Array(1, -2, -3, -4, 3, 5)

Length_of_Number_Set = UBound(Number_Set) + 1

Size_of_Groups = 2

Target_Figure = 1

Number_Of_Satisfactions = 0

For i = 1 To Length_of_Number_Set

    Primary_Number_In_Question = Number_Set(i - 1)

    For j = 1 To Length_of_Number_Set
    
        Secondary_Number_In_Question = Number_Set(j - 1)
    
        If j <> i Then
        
            SumTotal = Primary_Number_In_Question + Secondary_Number_In_Question
            
            If SumTotal = Target_Figure Then
                
                
                Number_Of_Satisfactions = Number_Of_Satisfactions + 1
                ReDim Preserve DynamicArray1(1 To Number_Of_Satisfactions)
                DynamicArray1(Number_Of_Satisfactions) = i
                ReDim Preserve DynamicArray2(1 To Number_Of_Satisfactions)
                DynamicArray2(Number_Of_Satisfactions) = j
                
            End If
            
        End If
    
    Next j

Next i

ReDim Solution_Array(1 To (Number_Of_Satisfactions / 2))

For k = 1 To (Number_Of_Satisfactions / 2)

    Solution_Array(k) = Number_Set(DynamicArray1(k) - 1) & ", " & Number_Set(DynamicArray2(k) - 1)

Next k

    For intIndex = LBound(Solution_Array) To UBound(Solution_Array)
        strBuf = strBuf & "Solution " & intIndex & " = " & Solution_Array(intIndex) & vbLf
    Next

MsgBox "The pairs of values whose sum = " & Target_Figure & vbLf & strBuf, vbInformation

End Sub
 
Upvote 0

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
it is a misrepresentation of the problem to state the number of combinations in a data set of n=200 would need such large numbers of iterations to complete.

Please explain in exactly what way my statement misrepresents the problem -- without introducing unstated assumptions.
 
Upvote 0
It is a misrepresentation only in the fact that it is an analysis based on the worst case scenario.

namely that the target value is equal to sum [n=1 to k](k choose n).

additionally due to the exponential nature of the number of combinations, dismissing any feasibility based on averages ignores what could (or could not) be a significant domain of computationally possible potential target values.

i am not introducing any unstated assumptions. you were however by assuming the worst case target value.
 
Upvote 0
dismissing any feasibility based on averages ignores what could (or could not) be a significant domain of computationally possible potential target values.
Averages of what? Is one sought value more likely than another?

And if the analysis was wrong, perhaps you could correct it.
 
Upvote 0
the key to any algorithm that reduces these large numbers to reasonable sizes is the ability to cut iterations, and reduce comparison time/loop.

reducing comparison time per loop can be done with a few tricks, but is only really useful on data sets with a small average size, comparing to a larger goal, where you can the minimum number of elements needed to sum to the value and not perform a comparison for iterations below that level.

reducing iterations is done by comparing the current total of a certain combination, and then, when that combination exceeds the goal total, skipping any further combinations along that "branch". the most obvious example of this would be if you had a sorted set of numbers in ascending order, and the first element was greater than your goal value. there would be no need to test any other solutions.

i am just reiterating this just to be clear on what i mean.

so with this in mind, the number of data points is secondary (still important) to the relative position of the target value within the afore referenced range of: (the minimum element) to (sum [n=1 to k](k choose n)).

So at one extreme your analysis is correct, and the number of iterations would be an impossible task for the collective computing power on earth today even for relatively small numbers.

but at the other extreme, the number of iterations would be extremely low.

i am sure there exists a rough model of the behaviour in between these two edge cases, but it would not be exact i wouldnt think because the behaviour is reliant on the type of distribution of the data, not only the position of the target. so in terms of a better, and more concrete analysis, i cannot provide that.

so to put it simply there are 2 key variables (not taking into account data distribution).

these are the target value (tv), and the number of elements (n) in the search set.

so as n increases, the maximum tv decreases, and for a data set of 200 randomly distributed values this range would be around 1 to (the maximum element).

which is about the upper bounds of feasibility before the tv falls below the maximum element...ie not increasing n anymore

so from my experience the bounds for feasibility run from n=1 to ~200
and t from sum [k=1 to n](n choose k) at small n to maximum element at n=200

my method of solving these combinations runs at the order of 1,000,000 iterations/s so in the 200 element case it ran for 4 hours =about 15billion iterations (all very rough). that is only due to the ability to cut off iterations because of the relative size of the target value.

i am sure someone could explain it better, but that is my attempt.
 
Upvote 0

Forum statistics

Threads
1,223,920
Messages
6,175,377
Members
452,638
Latest member
Oluwabukunmi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top