Excel 2024: Use Fuzzy Match in Power Query


November 18, 2024 - by

Excel 2024: Use Fuzzy Match in Power Query

There's a research and development team at Microsoft known as Microsoft Labs. Almost 10 years ago, it invented a free Fuzzy Match add-in for Excel. A fuzzy matching algorithm looks for words that share a percentage of characters in common.

The figure below shows two data sets that need to be matched. Columns A and B contain the list of employees. Columns D and E contain the names of the employees who filled out a required form. You need to identify the people who haven't yet returned the form. Unfortunately, a VLOOKUP or XLOOKUP won't work since column A uses a 'last name, first name' format while column D contains a nickname and last name. These two data sets can be matched using the Fuzzy Match tool that's now built into Windows versions of Microsoft 365.


Columns G and H show a translation table that will be used by the fuzzy match to help match full first names with their nicknames. The translation table requires two columns, labeled "From" and "To". The fuzzy match will likely match Kris and Kristy because they share many letters. But it will need an entry in the translation table for Bill and William or Bob and Robert.

Before you can perform the match, all three ranges of data have to be converted to a table by selecting each individual range and pressing Ctrl+T. Then Rename each table: Select one cell in a table. Go to the Table Tools tab in the Ribbon and type a meaningful name such as "Census", "Forms", and "Nicknames".

You need to convert each of the three tables to a connection in Excel. From cell A1, select, Data, From Table/Range (as shown at the red arrow in Figure 1). Excel will open the Power Query Editor. The first icon on the Home tab says "Close and Load". Click the drop-down menu below it and choose "Close and Load To" to open the Import Data dialog box.

Choose the fourth item, called Only Create Connection.

Repeat the process of creating a connection for the other two tables, starting in cell D1 and cell G1, respectively. If you created all three connections correctly, you should see three queries listed as "Connection Only" in the Queries & Connections panel on the right side of the Excel window.

Select a blank cell in your worksheet. From the Data tab, select Get Data, Combine Queries, Merge to open the Merge dialog.

There are many subtle settings in this dialog that aren't intuitive. The figure at right shows the 8 steps:

1. From the top drop-down menu, select the Census table.

2. In the small data preview, click on the heading(s) of the fields to be used for the matching. In this case, it's the Employee Name heading.

3. From the second drop-down menu, choose the name of the lookup table. In this case, Forms.

4. In the data preview, click on the heading(s) of the fields to be used for matching, such as Name.

5. Choose a Join Kind of Left Outer. Choose the box for "Use Fuzzy Matching to Perform the Merge".

6. Several special settings are hidden behind the Fuzzy Matching Options drop-down menu. Click the triangle to reveal this section.

7. Scroll to the bottom of the section and set Nicknames as the Transformation Table.

8. Verify that the number of matches found is the same as the number of records in the Forms table.



Tip

If you become comfortable with fuzzy matching, you might change the Join Kind from Left Outer to Left Anti in order to return only the list of people in the first table who do not have a match in the second table.

If there are fewer records than you expect, you might experiment with the "similarity threshhold". By default, the similarity threshold requires an 80% match. You could try changing this to 0.7 or 0.6 to see if it improves the number of matches. But be careful. If you set this too low, you risk the chance of false matches happening. For example, at 0.4 similarity, both Mason Astley and Lucy Astley would be seen as a match.

Click OK to perform the Merge. The grid in the Power Query editor will show columns for Employee Name, Department, and then a column called Forms. The value in each row for Forms simply says "Table" in each row. To the right of the "Forms" heading is an Expand icon with two arrows pointing left and right. Click this icon to choose which fields from the Forms table to return.

By default, the new fields will have the table name followed by a period and then the field name (such as "Forms.Form Received"). To prevent this, uncheck the box for "Use Original Column Name As Prefix".

Once you have the preview shown in the Power Query Editor, go to Home, Close & Load to deliver the results to a new table on a new worksheet. You could optionally use "Close & Load To" and specify a location on an existing worksheet for the table.

At this point, review the results to make sure no false matches were found. If everything looks good, you can sort or filter to remove the records that show a match, leaving the people who haven't turned in a form.

There's an easy audit method to look for people in the Forms table who did not show up in the Merge query. Use Data, Get Data, Combine Queries, Merge. Specify Forms as the first table, using Name as the key field. Specify Merge1 as the second table, using Name as the key field. For the Join Kind, specify Left Anti (Rows Only In the First Table}.

Ever since my Accounting 101 class, I was taught that "close" is never acceptable in accounting. This leads to a reluctance to trust the fuzzy matching algorithm. Yet there are cases where the fuzzy match tool is the only solution short of manually matching records.

As more people turn in their forms, choosing Data, Refresh All will automatically perform the fuzzy match again.


This article is an excerpt from MrExcel 2024 Igniting Excel

Title photo by Jeremiah Lawrence on Unsplash