# Tough Problem 6 (VBA)



## Andrew Fergus (Jan 25, 2009)

I have volunteered to submit another tough problem so here goes.

This is one that has puzzled mathematicians for a while and solving this may (or may not) shatter one of the foundations of public key cryptography.  Public key cryptography currently relies on the inability to *efficiently* factor numbers.  The public key component is essentially comprised of two very very very large prime numbers multiplied together to provide a product, which is known, but the factors are not.

The 'tough problem' is to come up with an efficient  routine that will factor a very very large number.

Current wisdom states that such large numbers (these could be 100 digits longer or higher) cannot be factored efficiently - this means the computational power required to solve the problem is worth more than the message itself, or the time required to solve the problem is such that the message is now obsolete (as in many years out of date).  I figure we have enough smart people here with differing backgrounds and (intentionally or not) an interest in maths to give this a go.

As an example, the following number is the product of two large prime numbers:

9,362,562,987,658,701,322,598,867

Can you find the two factors?  And can you find them efficiently?

All we know is there are 2 factors, both of which are prime numbers (which excludes 1) and (in theory) there should only be one possible solution.

Note : you might consider a brute force method, which starts at the square root (say n) of the number and only tests odd numbers from n to 3.  All of us are capable of coding such a brute force method, but can you code an efficient method?

Regards
Andrew


----------



## Andrew Fergus (Jan 25, 2009)

Discussion on various approaches might also be useful.  This challenge is about speed so we may need a more collaborative approach - I'm willing to start the ball rolling.

How does one handle large numbers?  Do you break them up in some way?  Store them as strings?  Or store each digit in an array?  How do you handle the mathematical functions when VBA will not provide the correct answer for values greater than 15 digits?

Do we seek one of the factors (by continually testing candidates to find a remainder of zero) or is there an approach that will find both factors simultaneously?  In seeking a single factor, it makes sense to look for the factor that is less than the square root given you will be testing a smaller population but I believe this is still an inefficient approach.

Can this approach be further refined by using multi-threading and multiple starting points?  For instance, 1 thread starts at sqr and checks lower values, another starts at 3 and checks higher values, and two other threads start at some other point (e.g. sqr/2 or sqr(sqr)) checking higher and lower values.

When testing potential factors, should we first eliminate non-primes?  For example by only testing values that end in 1, 3, 7 or 9 you have cut down the potential divisors by 60%.  Also, any number whose sum of the digits is divisible by 3 is also divisible by 3 and hence not a prime number (e.g. we can tell at an glance 12363 is not a prime, nor 987654321 etc). This should cut down the number of divisors by a further 50% but does introduce additional steps into the process.

Alternatively, should we look at using a base other than base 10?  Something like base 30 for instance, being the product of the lowest 3 prime numbers 2, 3 & 5.  In base 30 the prime numbers take one of the following forms : n+1, n+7, n+11, n+13, n+17, n+19, n+23 and n+29 where n is any number divisible by 30.  In base 30 you only need to test 27% of the potential candidates.

We could use something like base 210 (2*3*5*7) but that would be unwieldy and I suspect this would be a case of diminishing marginal returns.......it's probably better to not got there.

Rather than trying a brute force approach (even with base 30, and multi-threading) is there an elegant approach like Colin's Knights tour solution?

All thoughts and comments appreciated.

Andrew


----------



## SydneyGeek (Jan 25, 2009)

Hi Andrew, 
I'm not a mathematician so I would tend to go brute force -- take the square root and generate primes up to that point, then check for remainders. 
Maybe you could try a smaller set -- the Mersenne primes, given that it's a much smaller set (primes of the form 2^n -1) and all of the truly huge primes are Mersenne. 

Denis


----------



## pgc01 (Jan 26, 2009)

Hi Andrew

Now, this is a really interesting problem.

I've only still had time to think about your first paragraph, the basic part of the problem.



> How does one handle large numbers? Do you break them up in some way? Store them as strings? Or store each digit in an array? How do you handle the mathematical functions when VBA will not provide the correct answer for values greater than 15 digits?


 
I would store the big numbers as strings, instead of arrays.

Whatever the method you choose, I suspect you'll have to do lots of calculations, especially multiplications with big numbers. To be efficient in the operations you should do them with chunks of digits from both operands, using as much as possible the vba capabilities in terms of the operations.

For ex.

12345678*87654321=(1234*10^4+5678)*(8765*10^4+4321)

12345678*87654321=(1234*8765)*10^8+(1234*4321)*10^4+(5678*8765)*10^4+(5678*4321)

This way we have only numbers with a maximum of 8 digts followed by some zeroes which can be manipulated with Longs.


So I see 2 possibilities:

1 - Long: maximum value about 2*10^10.

Divide the numbers in chunks of 4 digits (or 4-5) and manipulate them with Longs.

2 - use the subtype Decimal: maximum value about 79*10^26

Divide the numbers in chunks of 13 digits (or 13-14) and manipulate them with Longs.

I've not tested the efficiency of the operations with Decimals vs. Longs, but even if the implementation is less efficient it may pay to use option n. 2, as for big numbers you'd have to perform less than 1/3 of the operations.

For any of the options it seems a good idea to store the numbers as strings. It makes it very easy to extract a chain of charaters into a number:

lNumber = CLng(Mid(sNumber, 27, 4))

*Conclusion:*

My choice: work with the big integers stored in strings and manipulate them in chunks of 13 digits using the Decimal subtype.

I'm busy this week but I'll try to post something else about the rest of the problem.

Cheers


----------



## Andrew Fergus (Jan 26, 2009)

Hi Pedro

My initial attempt (about a year or two ago) did what you described in that I stored the numbers as strings (without breaking the numbers into chunks). As you say this makes it very easy to extract the digit(s) you want to work with. The method employed worked well for digits up to about 15 digits, but string handling in VBA is slow. Although I like your method of breaking the numbers into manageable chunks.

My latest attempt is to store the number in an array and accessing the number is not an issue given you can use loops like For i = 1 to ubound(MyArray()) : n = MyArray(i).

I already have a solution that uses strings and the following method - but I now want to try this using arrays. But I also want to see what approach others might take.

The method involves deriving both factors simultaneously using the principles of long multiplication (in reverse). For instance, given prime numbers (greater than 2) always end in the digits 1, 3, 7 & 9 then any number that is a product of two prime numbers (>2) will also end in 1, 3, 7 & 9. If not then it is possible either, or both, factors was not prime.

If we know the last digits of both factors is one of (at most) 3 possible combinations (sometimes 2), then our work has already been reduced significantly. For instance, given the number in my original post, the final digits for both factors must be one of the following combinations:
1 & 7
or
3 & 9.

There are no other possible combinations of final factor digits that provide a final product digit of 7 where the 2 factors are prime.

Working out the next digit for each factor is also simpler in that only 10 possible combinations of the digits 0-9 and 0-9 for the 2nd to last digit of each factor will provide the correct last 2 digits of the product (taking into account any value to carry forward). Essentially what I am doing is breaking the problem down to solving one digit at a time - if you manually write out a long multiplication in columns you can work out how to derive the digits in the product.

It's not as bad as it looks - you don't need to test for factors that are as long as the final product given the length of the product = len(factor1) + len(factor2) -1* (*sometimes) so the possible number of combinations to test is a lot lower than anticipated. But is it lower than the number of brute force combinations of just trying to find a single factor? I'm sure someone could work that out......

Regarding storing the numbers as string, multiplication of long numbers stored as strings will require a custom routine that draws on your memory of completing long multiplication questions manually. However, beware that string handling is slow in VBA. If you pursue this option you will also need custom addition, division and subtraction functions. Plus another custom function to work out the sqaure root! There is actually a formula for that.....

Cheers
Andrew


----------



## pgc01 (Feb 1, 2009)

Hi Andrew

As I said I had a very busy week and I was not able to think about your problem. I hope this week I can.

The important part of the problem, in my opinion, will be to decide what algorithm to use, assuming you already have an efficient set of arithmetic functions for big numbers.Since I did not yet have time for that, I started by the easier part, the implementation of those arithmetic functions. The best way to decide between storing the numbers using strings or arrays is to test which is faster. I implemented a multiplication sub using the 2 approaches and the tests say you are right, storing the big numbers in arrays is faster.

I benchmarked the 2 approaches using numbers with all digits equal to 9, the results I got were:

- multiply 1,000,000 times a number with 30 digits "9" by another of 15 digits "9". I get a time of about 52s using strings and 37s using arrays.

- multiply 100,000 times a number with 120 digits "9" by another of 60 digits "9". I get a time of about 16m45s (I thought it would not stop) using strings and 27s using arrays.

The first conclusion is that you are right, the arrays approach is always better. For small numbers it is a bit better but as the size of the numbers augments it's much, much better. So arrays it is.

I'll try to look into what methods can be used to perform efficiently the factorisation of numbers during next week to see if I can find alternatives to what you already suggested.


----------



## Andrew Fergus (Feb 10, 2009)

Hi Pedro

How are you getting on with this?

Andrew


----------



## pgc01 (Feb 10, 2009)

Hi Andrew

I can say that it's going well and not that well(!).

On one hand I didn't advance much in terms of coding. On the other hand I had heard of methods to factorize big numbers, but I had never looked into them. I now had the chance to look at the most efficient ones, like the General Number Field Sieve, the Quadratic Sieve or the Elliptic Curve method. As I had told you I like this theme and so it has been very interesting for me. The results, however, are not good in terms of a code to solve your problem. Any of this methods requires serious study, months of work, and that if you have the necessary mathematics background.

In conclusion, I've enjoyed it until now but in terms of a practical solution I'm back to square one, either the usual method or your idea of finding both the factors at the same time.

I've been thinking about the implementations of the arithmetic operations for big numbers. Addition and subtraction are very simple. I've done some testing on my multiplication function and I think it's OK. I have to think about the division. I don't know if I will implement the square root. I've done some square roots manually, just to remember exactly the algorithm (I think I hadn't done since I was 12), and it shouldn't be that hard. Maybe I'll do it just for fun. It has been a slow progress because it requires some contiguous time free and not just the 5 minutes to answer a question in the excel forum. I've been busy but I haven't forgotten about this problem and I've enjoyed it so far. 

Cheers
Pedro


----------



## Lewiy (Feb 13, 2009)

Given that the greatest minds in the world have no efficient way of solving this problem, I’m thinking it’s probably beyond both me and the reasonable limits of Excel.  However, I do have a very simple way of dealing with smaller semi-primes.
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-comfficeffice" /><o> </o>
This could be construed as cheating to a certain degree as I started by pre-loading a worksheet with a list of the first 101,341 prime numbers obtained from http://www.valeriodistefano.com/gutenberg/00_multilingual_dvd/etext93/prime12.txt  Placing 1-65536 in column A and 65537-101341 in column B.
<o> </o>
The largest of these primes is 1191679 so theoretically you can find the prime factors of semi-primes up to 1,420,098,839,041 although obviously there will be many which are factors of a higher prime than 1191679 and a lower one.
<o> </o>
Then it’s a simple task of running through the primes and dividing the target number by them to see if you get an integer result, which takes a fraction of a second.
<o> </o>
So with the target number in H1:


```

```


```

```


```
Sub FactorMe()<o:p></o:p>
Dim MyNumber As Double<o:p></o:p>
Dim c As Long, i As Long<o:p></o:p>
MyNumber = Cells(1, 8).Value<o:p></o:p>
i = 1<o:p></o:p>
For c = 1 To 65536<o:p></o:p>
    If Int(MyNumber / Cells(c, 1)) = (MyNumber / Cells(c, 1)) Then<o:p></o:p>
        Cells(i, 3) = Cells(c, 1)<o:p></o:p>
        Cells(i, 4) = MyNumber / Cells(c, 1)<o:p></o:p>
        i = i + 1<o:p></o:p>
    End If<o:p></o:p>
    If Cells(c, 2) <> "" Then<o:p></o:p>
        If Int(MyNumber / Cells(c, 2)) = (MyNumber / Cells(c, 2)) Then<o:p></o:p>
            Cells(i, 3) = Cells(c, 2)<o:p></o:p>
            Cells(i, 4) = MyNumber / Cells(c, 2)<o:p></o:p>
            i = i + 1<o:p></o:p>
        End If<o:p></o:p>
    End If<o:p></o:p>
Next c<o:p></o:p>
End Sub
```

<o> </o>
I figure the limit here will be a combination of how many prime numbers you have access to and the numerical storage limits of Excel/VBA.


----------



## Andrew Fergus (Feb 13, 2009)

Hi Lewiy

I like your train of thought given you are only using prime numbers as a divisor.  This sort of process could be used to self-generate prime numbers until all of the rows and columns are full of prime numbers.  Note you only have to test values lower than the square root of your target number.  Also, I suspect the issue (as Pedro mentioned) will be the handling of ever-increasing numbers in VBA.

Andrew


----------



## Phox (Oct 7, 2009)

You could reduce the possible number of primes by looking at the last digit of the composite.

If it ends in 1, just search for primes ending in 1, 3, or 9
If it ends in 3, just search for primes ending in 1 or 7
If it ends in 7, just search for primes ending in 1 or 3
If it ends in 9, just search for primes ending in 1, 3, or 7


----------

