I have a somewhat usual problem that I'd like to automate. In our inventories of equipment, different tracking systems sometimes provide different assigned users.
We have four systems - one tracking system that has IT-entered data, one system with user accounts entered instead of makes, the last recorded login, and a self-reporting system where users can enter what equipment they have.
I put together a sheet that grabs all this data, splits it into first and last name (Accounts are firstname.lastname@email.com), and stacks these side-by side so I can easily see if they match. But I'd like to use formatting to highlight matches in large lists, as an inventory can include thousands of items, and eliminating the easy items fast helps make these more manageable.
The problem is, sometimes they do match, mostly, but not quite. Jonathon Jones in the hand-entered tracking might be Jonathan instead, or have the account Jonathon.G.Jones or Jonathon.Jones1. User entered data might have a typo. Sometimes names change but aren't reflected in all the tracking.
I suspect asking for squishy results like 'similarity' in a deterministic program like excel is probably a ridiculous ask . . . but does anybody have a thought about how I might approach that?
Currently, I'm just assigning scores for every perfect match - a point for every match in the 'first name' range to the first entry, second, third, and fourth, and doing the same for last name. If the entry is blank or an error, I output a different unicode hypen (dash, endash, emdash, etc) to force no match to be scored. They usually get close, but can fail to highlight items where matching users don't actually have 100% matching names.
However, I only know enough about Excel to be dangerous; is there a better way? All I can think is to try and filter out common oddities like middle initials and numbered accounts before scoring.
We have four systems - one tracking system that has IT-entered data, one system with user accounts entered instead of makes, the last recorded login, and a self-reporting system where users can enter what equipment they have.
I put together a sheet that grabs all this data, splits it into first and last name (Accounts are firstname.lastname@email.com), and stacks these side-by side so I can easily see if they match. But I'd like to use formatting to highlight matches in large lists, as an inventory can include thousands of items, and eliminating the easy items fast helps make these more manageable.
The problem is, sometimes they do match, mostly, but not quite. Jonathon Jones in the hand-entered tracking might be Jonathan instead, or have the account Jonathon.G.Jones or Jonathon.Jones1. User entered data might have a typo. Sometimes names change but aren't reflected in all the tracking.
I suspect asking for squishy results like 'similarity' in a deterministic program like excel is probably a ridiculous ask . . . but does anybody have a thought about how I might approach that?
Currently, I'm just assigning scores for every perfect match - a point for every match in the 'first name' range to the first entry, second, third, and fourth, and doing the same for last name. If the entry is blank or an error, I output a different unicode hypen (dash, endash, emdash, etc) to force no match to be scored. They usually get close, but can fail to highlight items where matching users don't actually have 100% matching names.
However, I only know enough about Excel to be dangerous; is there a better way? All I can think is to try and filter out common oddities like middle initials and numbered accounts before scoring.