Ranking similarity between multiple fields

Lurkily

New Member
Joined
Nov 30, 2011
Messages
22
I have a somewhat usual problem that I'd like to automate. In our inventories of equipment, different tracking systems sometimes provide different assigned users.

We have four systems - one tracking system that has IT-entered data, one system with user accounts entered instead of makes, the last recorded login, and a self-reporting system where users can enter what equipment they have.

I put together a sheet that grabs all this data, splits it into first and last name (Accounts are firstname.lastname@email.com), and stacks these side-by side so I can easily see if they match. But I'd like to use formatting to highlight matches in large lists, as an inventory can include thousands of items, and eliminating the easy items fast helps make these more manageable.

The problem is, sometimes they do match, mostly, but not quite. Jonathon Jones in the hand-entered tracking might be Jonathan instead, or have the account Jonathon.G.Jones or Jonathon.Jones1. User entered data might have a typo. Sometimes names change but aren't reflected in all the tracking.

I suspect asking for squishy results like 'similarity' in a deterministic program like excel is probably a ridiculous ask . . . but does anybody have a thought about how I might approach that?

Currently, I'm just assigning scores for every perfect match - a point for every match in the 'first name' range to the first entry, second, third, and fourth, and doing the same for last name. If the entry is blank or an error, I output a different unicode hypen (dash, endash, emdash, etc) to force no match to be scored. They usually get close, but can fail to highlight items where matching users don't actually have 100% matching names.

However, I only know enough about Excel to be dangerous; is there a better way? All I can think is to try and filter out common oddities like middle initials and numbered accounts before scoring.
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
I found a good VBA example for ranking two strings.
Search for the user "bchatham".

I copied his entire code example and ran the very top function stringSimilarity on the following comparisons in the Immediate Window. I don't know if you would use the return amount and assume that a value over X is considered a close enough match.

VBA Code:
?stringSimilarity("Jonathon.G.Jones","Jonathon Jones")
 0.846153846153846 

?stringSimilarity("Jonathon.G.Jones","Jonathon.Jones")
 0.928571428571429 

?stringSimilarity("Jonathon G Jones","Jonathon Jones")
 1 

?stringSimilarity("Jonathon.Jones1","Jonathon Jones")
 0.88 

?stringSimilarity("Jonathon Jones1","Jonathon Jones")
 0.956521739130435

It seems like replacing the periods with spaces in the first string and comparing with spaces in the second string gives the higher results. Anything over 0.9, then, is close enough?
 
Upvote 0
Solution
Thank you for the time you took. Unfortunately, I can't replicate your results.

I resaved as a macro-enabled book, opened the VBA editor, hit insert class module, pasted the code found there, and closed out VBA. But even simple tests like =stringSimilarity(A1,A2) give me a #name error. The same with strings, like =stringSimilarity("Robert","Robbie")
1651847266028.png

I confess that I'm rather clumsy and have only attempted to use VBA a couple of times in excel. I attempted this with google's help as it seems like a copy/paste solution, but it's not working out for me.

I have to deal with Excel 2016 at work; perhaps the older implementation of VBA (Or newer, that's a 2014 discussion) means the solution isn't suited to this version?

1651847343305.png
 
Upvote 0
This won't work from the "ThisWorkbook" code section. Create a new Module and paste it there.

Those names don't have a lot of similarity.
BobRobert0.285714
RobertRobbie0.4
 
Upvote 0
This won't work from the "ThisWorkbook" code section. Create a new Module and paste it there.

Those names don't have a lot of similarity.
BobRobert0.285714
RobertRobbie0.4
Sorry I never got back to you previously.

I DID make this work -- though only stringSimilarity is working, it's not letting me, for example, use the functions that SEEM to be for comparing one values to multiple values. However, it's definitely enough to make it work.

Oddly, I get an error on a three-character string with a space in the middle. I think it doesn't count the space as a character, and requires more than two characters. I'm using triple-hyphens, triple-emdash, etc for names in different fields that are blank, to ensure that they remain a non-match to everything, but also don't confuse the function.
 
Upvote 0

Forum statistics

Threads
1,224,825
Messages
6,181,190
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top