Levenshtein Distance & String Transposition

LittleFurry

New Member
Joined
Jun 17, 2019
Messages
2
Hi all

I am new to the MrExcel community and am working on a fuzzy matching assignment. I managed to copy the Levenshtein Distance VBA code from https://stackoverflow.com/questions/4243036/levenshtein-distance-in-vba and it gave me the result I wanted.

For instance if I compared "Osama Ben Ladn" with "Osama Bin Laden", the Levenshtein Distance output is 2, which is correct.

However when I used the code to evaluate "Prem Kumar" with "Kumar Prem", the Levenshtein Distance output is 10. In actual fact, the Levenshtein Distance should be 0 since the individual is the same person, except that the name is transposed.

Appreciate any help from the community to guide me in the right direction and provide VBA code if possible. Thank You very much!
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Hello there and welcome. Whilst I am not aquainted with the Levenshtein Distance, I found this on wiki.
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
This suggests that the result of 10 is correct (you would delete the first 5 characters - Prem and a space, and insert them at the end). It looks like you will need a different algorithm.
 
Upvote 0
Hello there and welcome. Whilst I am not aquainted with the Levenshtein Distance, I found this on wiki.
This suggests that the result of 10 is correct (you would delete the first 5 characters - Prem and a space, and insert them at the end). It looks like you will need a different algorithm.

Hi jmacleary

Thanks for your reply! Yes I am aware that the Levenshtein Distance VBA code I have copied online is correct. The only problem is that it will interpret "Prem Kumar" and "Kumar Prem" as a different individual. And yes I do agree with you, a different algorithm is needed.

By any chance, would you know how to define this problem? I have tried searching "swap order of substrings on VBA" and "name transposition" but unfortunately I can't seem to find an answer. Thanks once again!
 
Upvote 0
I know this thread is two months old, but I stumbled across it while researching something else. The term you're looking for is "fuzzy matching". You can search for that, or here are a few thoughts.

First, here's a link to a free Microsoft download to do that:

https://www.microsoft.com/en-us/download/details.aspx?id=15011

Next, I wrote a similar routine to come up with a similarity percentage. It's not as robust as Microsoft's no doubt, but I could maintain it. The basic idea was to find the largest common substring of the 2 strings. See

https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Remove that from both strings and repeat. Quit when the strings are empty, or when the largest common substring is less than 3 characters. Take the total number of characters removed, divide by the sum of the original sizes of the strings, and there you are.

Sometimes it worked better than other methods, sometimes worse. That's the problem with fuzzy matching, sometimes it's highly dependent on the data, so you have to choose the right method. Good luck!
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,182
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top