Identify fuzzy matches in one column

Vbanoob98

Board Regular
Joined
Sep 13, 2019
Messages
128
Hello!

I have a list of 20k names and I am trying to find the fuzzy duplicates on the same column

Like:

Company Waterfall
Company Waterfll
Company Waterfalls

Ideally those 3 would be highlighted, marked with a 1 or something that allows for identification

Any ideas? been looking for weeks and I can find a lot of similar problems but no solutions :S
 
The most likely is that your text has what is the standard csv delimiter for your language, which for most countries is a comma.
Just add an additional step
Transform > Replace Values > Replace Values
Value to find quotation (") mark Replace with "nothing"
 
Upvote 0

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
The most likely is that your text has what is the standard csv delimiter for your language, which for most countries is a comma.
Just add an additional step
Transform > Replace Values > Replace Values
Value to find quotation (") mark Replace with "nothing"
It doesnt seem to be that. Anyways thats not an issue

The other thing that is odd is that certain names have a match like this:

Name Fuzzy Concat

ABDD ABDD&&ABDDX

But ABDDX appears to be missing on the list. When I filter for those 2 before doing the fuzzy join and the rest of the steps. Both of them show up.
 
Upvote 0
I would need to see your data to get into that sort of detail. Remember also that you put in a Fuzzy Percentage match factor and if you set that too low you will get a lot of matches that may not be helpful.
 
Upvote 0
It sounds like you got as far as doing the FuzzyNestedJoin.
The additional step that I have now also modified to show a consistent sort is below.

The fuzzy join to the same Table gives you a column with a Table as contents on each row.
This needs to be converted to a List after which you can Remove the Column with the Table values.
Because we are looking for duplicates we want the concatenated output to be in the same order on each line, so we add a List.Sort into the List conversion.

After that click on the Expand button next to the column heading and select Extract Values (and pick a delimiter)

See if the below is enough to configure it for your specific case.
Power Query:
= Table.AddColumn(RenamedColumnOptional, "New", each List.Sort(
    Table.ToList([Table2])))

View attachment 70089


View attachment 70090


Hi all. I'm having a similar problem to the one outlined above. I can't quite get the solution working though. Would someone mind uploading an example file or the full code? Many thanks
 
Upvote 0
Hi all. I'm having a similar problem to the one outlined above. I can't quite get the solution working though. Would someone mind uploading an example file or the full code? Many thanks
It is generally better to start a new thread and in a case like this reference this thread.
In the new thread provide detais on what it not working.
Also update your profile to show what version of Excel you are using ie do you have MS 365.

Important for Power Query,
What is the table name of the main table and the table name of the table you are trying to get the matching data from ?
What is the name of the field(s) on each table that you are using to match on ?
Ideally provide an XL2BB sample of each table so we don't have to manually create data to test it and realistic enough so that we can cater for any specific features or patterns in your data.

If you leave a link to your new thread here, I will take a look.
 
Upvote 0
It is generally better to start a new thread and in a case like this reference this thread.
In the new thread provide detais on what it not working.
Also update your profile to show what version of Excel you are using ie do you have MS 365.

Important for Power Query,
What is the table name of the main table and the table name of the table you are trying to get the matching data from ?
What is the name of the field(s) on each table that you are using to match on ?
Ideally provide an XL2BB sample of each table so we don't have to manually create data to test it and realistic enough so that we can cater for any specific features or patterns in your data.

If you leave a link to your new thread here, I will take a look.
Ah, sorry about that. New thread here...


thanks!
 
Upvote 0

Forum statistics

Threads
1,223,997
Messages
6,175,874
Members
452,679
Latest member
darryl47nopra

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top