Identify fuzzy matches in one column

Vbanoob98

Board Regular
Joined
Sep 13, 2019
Messages
128
Hello!

I have a list of 20k names and I am trying to find the fuzzy duplicates on the same column

Like:

Company Waterfall
Company Waterfll
Company Waterfalls

Ideally those 3 would be highlighted, marked with a 1 or something that allows for identification

Any ideas? been looking for weeks and I can find a lot of similar problems but no solutions :S
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
Upvote 0
Here is an option using PQ:

Fuzzy Match of table joined with itself, then concatenating the result.
Probably needs a sort before the concatenate so that all the results are in the same order.

Book5
AB
1NameNew
2WaterfallWaterfall|Waterfll|Waterfalls
3WaterfllWaterfll|Waterfall|Waterfalls
4WaterfallsWaterfalls|Waterfall|Waterfll
5John SmithJohn Smith|John Smyth
6John SmythJohn Smyth|John Smith
7John BrownJohn Brown
Merge1
 
Upvote 0
Here is an option using PQ:

Fuzzy Match of table joined with itself, then concatenating the result.
Probably needs a sort before the concatenate so that all the results are in the same order.

Book5
AB
1NameNew
2WaterfallWaterfall|Waterfll|Waterfalls
3WaterfllWaterfll|Waterfall|Waterfalls
4WaterfallsWaterfalls|Waterfall|Waterfll
5John SmithJohn Smith|John Smyth
6John SmythJohn Smyth|John Smith
7John BrownJohn Brown
Merge1

Thats very close to what I want. Ive tried that approach but I couldn't get that result. I'll try again! Thanks!
 
Upvote 0
It sounds like you got as far as doing the FuzzyNestedJoin.
The additional step that I have now also modified to show a consistent sort is below.

The fuzzy join to the same Table gives you a column with a Table as contents on each row.
This needs to be converted to a List after which you can Remove the Column with the Table values.
Because we are looking for duplicates we want the concatenated output to be in the same order on each line, so we add a List.Sort into the List conversion.

After that click on the Expand button next to the column heading and select Extract Values (and pick a delimiter)

See if the below is enough to configure it for your specific case.
Power Query:
= Table.AddColumn(RenamedColumnOptional, "New", each List.Sort(
    Table.ToList([Table2])))

1658789031352.png



1658790022877.png
 
Last edited:
Upvote 0
Solution

Forum statistics

Threads
1,221,442
Messages
6,159,905
Members
451,601
Latest member
terrynelson55

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top