when I test it with my password ******* it comes up with a password 4 lowercase characters long almost immediately which unlocks the document!
Why is this and can anyone suggest a combination I can use which is more secure?
Thanks
First, it's good you are becoming aware that the protection platform in Microsoft products is very weak. In its defense, Microsoft never claims to have reliable protection. In Office, a password is like the lock on your home's front door; its primary purpose is to keep your friends out. If someone really wants to get in, they will get in.
Here's some background on password protection to help answer your quoted inquiry.
When someone "password protects" a sheet in Excel, they generate a 16-bit 2-byte hash. When unprotecting the sheet, that value is compared to the hash. Excel allows for up to 255 password characters in its worksheet protection scheme. Keep in mind, since it is a case-sensitive scheme, there are over 90 acceptable characters, which translate into the multiple trillions of possibilities. Since the combination of possible passwords is much greater than the combination of possible hashes, many passwords might share the same hash.
In a much-popularized example, you can see this for yourself with the word "test". Protect a sheet with the word "test" (without the quotes). Now unprotect it with "zzyw". Are you feeling even worse now than before?
More info if you are really interested:
The MD5 hash being used (someone please correct me if I'm wrong here) is a standard mixing algorithm, executed as follows:
- take the ASCII values of all characters
- shift left the first character 1 bit
- shift left the second 2 bits
- continue for quantity of characters up to 15 bits, with the higher bits rotated
- XOR those values
- XOR the count of characters
- XOR the constant 0xCE4B
As you probably know, XOR is a logical term associated with a mathematical compound statement, sort of an acronym for "exclusive or". In this case, statement "A" is the password value you type in. Statement "B" is the generated hash. The XOR operation returns TRUE when only one of its combinations is TRUE. Here's why that translates to more than one password value possible, seen in the context of a truth table:
A B Result XOR
T T FALSE
T F TRUE
F T TRUE
F F FALSE
By the way, if you wanted to see the actual password, and not just a compatible one as you originally noted, it literally could not be done in this lifetime.
There are 94 standard characters (26 of A-Z; 26 of a-z; 10 of 0-9; and 32 special such as #,%,!, etc). That means, for every character there are 94 possibilities.
Extrapolating that, using the example of an 8-character password, the number of characters to test is
94 x 94 x 94 x 94 x 94 x 94 x 94 x 94
with equals
6,090,000,000,000,000
At the fairly ambitious pace of 100,000 password attempts per second, it would take 1,932 years to recover the exact password.
And that's just with 8 characters; with the 255 max it can take millions of years.
What all this boils down to is, if you don't want your work exposed to others, don't publish it. One thing for sure is, you are in good company…the whole world is in the same boat with this protection issue when using Excel. It's best to understand the issue as you are attemting to do, so you can make your own informed decisions about what and what not to risk putting in your workbook.