Download the tool from here: Download Fuzzy Lookup Add-In for Excel from Official Microsoft Download Center
Today, a look at the Fuzzy Lookup tool released by Microsoft Labs for Excel back in 2013/2014 time frame. The tool is fairly cool for matching imperfect data. This episode shows you the portfolio example that ships with the add-in.
Today, a look at the Fuzzy Lookup tool released by Microsoft Labs for Excel back in 2013/2014 time frame. The tool is fairly cool for matching imperfect data. This episode shows you the portfolio example that ships with the add-in.
Transcript of the video:
Learn Excel from MrExcel podcast episode 2271.
Fuzzy lookup and Excel. Hey welcome back to MrExcel podcast, I’m Bill Jelen.
Now this is an awesome tool that was created by Microsoft Labs all the way back in 2013.
I think they were maybe thinking of adding this technology to Excel but it actually got added to another product and never made it to Excel. But the ad is still out there and to get the ad, you have to go to this, download location. I’ll put that down on the YouTube comments so you don’t have to type it and scroll down here and click download, they have details, this is version 1.3, all the way back from 2014. You need Excel 2010 or newer. And if you want to install it successfully, you have this fuzzy lookup tab but it’s really hard to figure out what’s going on.
So you want to go to file, options, go to add ins, click on fuzzy lookup and this directory here, go to that directory and you will find this awesome stuff right here. The portfolio, sample and the read me. That explains how it works. Without this, you don’t have a chance. Alright. So we’re going to walk through the portfolio report example.
And what we have here, there’s four tables in this example. The first table, table tools design is called portfolio, it’s you know, our portfolio, this is the stocks we know and the shares. And then here is a table of the SMP500. And of course, the names here don’t match the names here which is so typical.
I mean someone misspelled the company name here, it’s just a mess. And then if you go into the instructions or that Word document, it will talk about these two tables here called Transformations and TokenWeight and you can explain, you know, the ink and incorporate it, Corp and Corporation mean the same thing and ignore things like MPV and USD, some junk data that’s here like MPV and you know, some other things. So just know to throw those out.
So, here’s how it works. Go to the fuzzy lookup and we’ll get a little panel over here on the right hand inside. Specify the left table and then the right table, there’s drop downs there. What’s really nice is you can specify two columns in theory, but here it’s just one column, the company in the left table and company in the right column—right table. And then which we want to output. We want everything on that table or not. At the end, this fuzzy lookup similarity, definitely want to ask for that.
And then we’re going to talk about this number matches. Now by default, they do want it and to me, that is super, super risky.
I use fuzzy match in the past. This tool and other tools and using just the top match is really just kind of a disaster, right. There’s no silver bullet here that’s going to solve all our problems. This is really going to help you. It’s going to try and find matches but you really need to ask for them to pry two matches at least or three matches so that way, a human, an intelligent human can go through and figure out which match is correct.
But for right now, I’m just going to click go. And so here’s these two columns are the original table, company and shares. It’s interesting they don’t make it into a table.
And then here is the matching data from the SMP500. So Amazon.com Inc. and stock. ++ and they were able to figure out Amazon.com Inc.
stock. See they threw out the Inc and stock and they were able to figure out Amazon.com Inc. Motorola, great, AT&T, great. Berkshire Hathaway, misspelled. Just terrible misspelled but they got it. Microsoft, Coca-Cola, Dr.
Prepper was matched to Dr. Pepper Snapple group, Autodesk, they threw out the junk at the end, thanks to our transformations table and figured out those Autodesk and then Mellon Bank. This one, it always amazes me because it’s not even close. Mellon and Bank are in the wrong order but I find if we came out here to that far radiant side to find that similarity, so that’s now at 86%. Not the worst one. The worst one here is at 81% where they match AT&T Corp to AT&T Inc only had a 81% similarity. Alright now here’s why I think that you should never do the top match.
I think you always want to come here and ask for at least two or three and see what they come up with. Alright, we’ll have three, what the heck and then we’ll click go. Alright now the difference here is we’re going to get more rows than we had in the original data set. And each row in the original date set might return up to three matches and it’s up to you to go figure out which is the right match. For example, Motorola. There’s actually two companies in the SMP500 that could be a match. Motorola Solutions and Motorola Mobility Holdings. Down here Coca-Cola, you have the Coca-Cola company that’s, you know, one organization and then Coca-Cola Enterprises and you might have wanted a CCE instead of KO.
And then let’s see, that’s the only… oh AT&T. I love this one. This is a great example of why fuzzy match can return the wrong results.
If you look at AT&T Corp and ITT Corporation, that’s only off by one character when you go right down to it. This one, you know, has an extra ampersand and we’re really, really lucky that we got the correct one there. It very could have easily happened that we would have an ITT Corp as the number 1 match. That’s why every time that I go a fuzzy match, I always ask for two and I take a little bit of time just to go through and validate the match that they found is really the best match.
I mean some of these, you know, you’ll find the best math especially at 50% isn’t the same company at all. But sometimes, you know, Coca-Cola and Coca-Cola Enterprises, I might have needed CCE. So, to me it’s always worth the time just to ask for two matches and then manually go through and look at them. And I get it, if you have 10,000 rows, no one’s going to go through and look at 10,000 rows but you know, you want your data to be right.
And this tool is so much easier than going through and manually matching them up which is what we have to resort to a lot of the time. Alright now, thanks to Liam Bastick and SumProduct.com, I’ve seen this tool many years ago back in 2013 and then it was lost, I couldn’t find it. There was actually a podcast, 4, 500 podcasts ago. This thing was a killer, horrible process. 10, 12 steps and step number 7 desperately needed the fuzzy lookup tool and I couldn’t find it. So headed right back to the guy and said, “Hey, you know, I have this installed in my machine but I can’t find anywhere where it will let you download it so I can’t give you the solution because you’re not going to be able to get past step 7.” So thanks to Leanne for knowing that it is out there.
The other thing that slightly fears me is, that strikes some fear to my heart, it’s currently March 4th 2019 when I’m recording this and the help document says that support goes or the license goes through July of 2019 which means that you only have, what, three months left. So, hey if you think you’re ever going to need the fuzzy lookup any time in the rest of your life, go download it now, alright. Because who knows, when July comes, there’s a good chance they won’t extend this and you won’t be able to download it anymore.
It’s an awesome tool thanks to Microsoft Labs, that’s the group of Microsoft that, you know, is in charge with making revenue, they’re in charge with making cool stuff and this is cool stuff.
Well hey, I want to thank you for stopping by. We’ll see you next time for another netcast from MrExcel
Fuzzy lookup and Excel. Hey welcome back to MrExcel podcast, I’m Bill Jelen.
Now this is an awesome tool that was created by Microsoft Labs all the way back in 2013.
I think they were maybe thinking of adding this technology to Excel but it actually got added to another product and never made it to Excel. But the ad is still out there and to get the ad, you have to go to this, download location. I’ll put that down on the YouTube comments so you don’t have to type it and scroll down here and click download, they have details, this is version 1.3, all the way back from 2014. You need Excel 2010 or newer. And if you want to install it successfully, you have this fuzzy lookup tab but it’s really hard to figure out what’s going on.
So you want to go to file, options, go to add ins, click on fuzzy lookup and this directory here, go to that directory and you will find this awesome stuff right here. The portfolio, sample and the read me. That explains how it works. Without this, you don’t have a chance. Alright. So we’re going to walk through the portfolio report example.
And what we have here, there’s four tables in this example. The first table, table tools design is called portfolio, it’s you know, our portfolio, this is the stocks we know and the shares. And then here is a table of the SMP500. And of course, the names here don’t match the names here which is so typical.
I mean someone misspelled the company name here, it’s just a mess. And then if you go into the instructions or that Word document, it will talk about these two tables here called Transformations and TokenWeight and you can explain, you know, the ink and incorporate it, Corp and Corporation mean the same thing and ignore things like MPV and USD, some junk data that’s here like MPV and you know, some other things. So just know to throw those out.
So, here’s how it works. Go to the fuzzy lookup and we’ll get a little panel over here on the right hand inside. Specify the left table and then the right table, there’s drop downs there. What’s really nice is you can specify two columns in theory, but here it’s just one column, the company in the left table and company in the right column—right table. And then which we want to output. We want everything on that table or not. At the end, this fuzzy lookup similarity, definitely want to ask for that.
And then we’re going to talk about this number matches. Now by default, they do want it and to me, that is super, super risky.
I use fuzzy match in the past. This tool and other tools and using just the top match is really just kind of a disaster, right. There’s no silver bullet here that’s going to solve all our problems. This is really going to help you. It’s going to try and find matches but you really need to ask for them to pry two matches at least or three matches so that way, a human, an intelligent human can go through and figure out which match is correct.
But for right now, I’m just going to click go. And so here’s these two columns are the original table, company and shares. It’s interesting they don’t make it into a table.
And then here is the matching data from the SMP500. So Amazon.com Inc. and stock. ++ and they were able to figure out Amazon.com Inc.
stock. See they threw out the Inc and stock and they were able to figure out Amazon.com Inc. Motorola, great, AT&T, great. Berkshire Hathaway, misspelled. Just terrible misspelled but they got it. Microsoft, Coca-Cola, Dr.
Prepper was matched to Dr. Pepper Snapple group, Autodesk, they threw out the junk at the end, thanks to our transformations table and figured out those Autodesk and then Mellon Bank. This one, it always amazes me because it’s not even close. Mellon and Bank are in the wrong order but I find if we came out here to that far radiant side to find that similarity, so that’s now at 86%. Not the worst one. The worst one here is at 81% where they match AT&T Corp to AT&T Inc only had a 81% similarity. Alright now here’s why I think that you should never do the top match.
I think you always want to come here and ask for at least two or three and see what they come up with. Alright, we’ll have three, what the heck and then we’ll click go. Alright now the difference here is we’re going to get more rows than we had in the original data set. And each row in the original date set might return up to three matches and it’s up to you to go figure out which is the right match. For example, Motorola. There’s actually two companies in the SMP500 that could be a match. Motorola Solutions and Motorola Mobility Holdings. Down here Coca-Cola, you have the Coca-Cola company that’s, you know, one organization and then Coca-Cola Enterprises and you might have wanted a CCE instead of KO.
And then let’s see, that’s the only… oh AT&T. I love this one. This is a great example of why fuzzy match can return the wrong results.
If you look at AT&T Corp and ITT Corporation, that’s only off by one character when you go right down to it. This one, you know, has an extra ampersand and we’re really, really lucky that we got the correct one there. It very could have easily happened that we would have an ITT Corp as the number 1 match. That’s why every time that I go a fuzzy match, I always ask for two and I take a little bit of time just to go through and validate the match that they found is really the best match.
I mean some of these, you know, you’ll find the best math especially at 50% isn’t the same company at all. But sometimes, you know, Coca-Cola and Coca-Cola Enterprises, I might have needed CCE. So, to me it’s always worth the time just to ask for two matches and then manually go through and look at them. And I get it, if you have 10,000 rows, no one’s going to go through and look at 10,000 rows but you know, you want your data to be right.
And this tool is so much easier than going through and manually matching them up which is what we have to resort to a lot of the time. Alright now, thanks to Liam Bastick and SumProduct.com, I’ve seen this tool many years ago back in 2013 and then it was lost, I couldn’t find it. There was actually a podcast, 4, 500 podcasts ago. This thing was a killer, horrible process. 10, 12 steps and step number 7 desperately needed the fuzzy lookup tool and I couldn’t find it. So headed right back to the guy and said, “Hey, you know, I have this installed in my machine but I can’t find anywhere where it will let you download it so I can’t give you the solution because you’re not going to be able to get past step 7.” So thanks to Leanne for knowing that it is out there.
The other thing that slightly fears me is, that strikes some fear to my heart, it’s currently March 4th 2019 when I’m recording this and the help document says that support goes or the license goes through July of 2019 which means that you only have, what, three months left. So, hey if you think you’re ever going to need the fuzzy lookup any time in the rest of your life, go download it now, alright. Because who knows, when July comes, there’s a good chance they won’t extend this and you won’t be able to download it anymore.
It’s an awesome tool thanks to Microsoft Labs, that’s the group of Microsoft that, you know, is in charge with making revenue, they’re in charge with making cool stuff and this is cool stuff.
Well hey, I want to thank you for stopping by. We’ll see you next time for another netcast from MrExcel