This might be an impossible task, but I'm in a de-duping project and was hoping to write a formula that could plug in a string from before the @ in an email address, say "harrystruman" and then compare that to a range of values in a column or columns, and find the best match or partial match (or even, say top 10 matches) of any substring within the original string.
So searching through the columns of something like below:
A,B,C
First,Middle,Last
Harry,S,Truman
George,W,Bush
Franklin,D,Roosevelt
The thing is, I don't know how to search for all the substrings in "harrystruman" (there are 78 of them in this example: h, ha, har, harry, harrys, etc.). Each one would need to be compared to column A and columns C the example and then...maybe scored as a % match? Or list full matches of a string (Truman = Truman) somewhere? I guess I'd want to look for 100% match or matching 3 or more characters from the search value.
But honestly, don't even know where to get started. Most promising was MATCH, using wildcards for the search value, but in this example there would need to be 78 different search values.
My instinct is to go try this in python so I could loop through the substrings while iterating through a csv, but is there a way in Excel?
Thanks for any advice.
So searching through the columns of something like below:
A,B,C
First,Middle,Last
Harry,S,Truman
George,W,Bush
Franklin,D,Roosevelt
The thing is, I don't know how to search for all the substrings in "harrystruman" (there are 78 of them in this example: h, ha, har, harry, harrys, etc.). Each one would need to be compared to column A and columns C the example and then...maybe scored as a % match? Or list full matches of a string (Truman = Truman) somewhere? I guess I'd want to look for 100% match or matching 3 or more characters from the search value.
But honestly, don't even know where to get started. Most promising was MATCH, using wildcards for the search value, but in this example there would need to be 78 different search values.
My instinct is to go try this in python so I could loop through the substrings while iterating through a csv, but is there a way in Excel?
Thanks for any advice.