VBA - Sophisticated approach to find and highlight duplicates

excelstarter1 · Aug 16, 2018

Hello VBA pros,

I have a tricky problem (well it is for me) regarding duplicates in a dataset. Regularly I receive manually generated customer lists from our finance/sales or another department including an overview of customers and their revenues in a specific period (customer name in column A and revenue in column B). Now the issues with those lists are duplicates, e.g. the customer Walmart appears several times as Walmart USA, Walmart (NY), Walmartt (with a typo), USA Walmart, NY Walmart and so on. I want to find all those duplicates and cluster the customers. The simple Excel function would not work. I already tried text functions like LEFT (for 3 characters, then run find duplicates). However, they are not sufficient.

Is there a way with VBA to identify duplicates by comparing the content of each individual cell to the full dataset (e.g. 1000 rows with customer names)? As the duplicate name could appear at the beginning (e.g. Walmart USA), the end (USA Walmart) or the middle (USA, Walmart, NY), it would make sense not to compare the full content of the cell but only 5 or 6 characters in order to take this issue into account.

Would be great if you could help me with a couples of ideas!

Thanks a lot!

Best regards

JoeMo · Aug 16, 2018

I don't regard this as sophisticated, but it may be helpful. This code will prompt the user to enter a customer name (not case sensitive, but can easily be changed to be so). Every instance of that name found in the used range of the active sheet will be highlighted in bold, red font.

Code:

Sub HighlightWordOrPhrase()
Dim wrd As Variant, ct As Long
wrd = InputBox("Enter customer name to highlight")
If wrd = "" Then Exit Sub
Dim c As Range, x As Variant, i As Long
Application.ScreenUpdating = False
For Each c In ActiveSheet.UsedRange
    If Not IsError(c) And Not IsEmpty(c) Then
        x = InStr(1, c.Value, wrd, vbTextCompare)
        If x > 0 Then
            ct = ct + 1
            With c.Characters(x, Len(wrd)).Font
                .Color = vbRed
                .Bold = True
            End With
        End If
    End If
Next c
If ct > 0 Then
    MsgBox ct & " instances of the customer name were found and highlighted"
Else
    MsgBox "No instances of the customer name were found in the activesheet"
End If
Application.ScreenUpdating = True
End Sub

If the customer name is say Ford you can enter it with a leading and trailing space (" Ford ") so you won't pickup something like "Fording the stream was a challenge" if that's an issue.

excelstarter1 · Aug 16, 2018

Thats great, thank you Joe!
Do you have an approach on how to take e.g. the first 5 characters of each cell (for the whole e.g. 500 rows) and compare that string to the whole list?

JoeMo · Aug 16, 2018

You are welcome.

Can you provide some sample data and expected result for your new question? I don't quite grasp exactly what you describe and want to achieve. If you want to highlight the first 5 characters of every cell in the used range in every cell in the used range those characters might appear, the whole sheet may become a highlighted mess.

excelstarter1 · Aug 16, 2018

Sure. Sorry, maybe my request was a bit confusing.
Usually the customers are not ranked like this but mixed (here all the different Walmart customers follow each other; see below).
The idea is to take the first five characters of each cell (here e.g. row 2 = Walma) und use that string (Walma) to check the full list and identify duplicates.
Then repeat the process with the next customer in row 3 and so on.
Ideally I want a new row where all different Walmart customers are allocated to Walmart (or here the string Walma), Nike, Adidas and so on. That last task might be a bit more complex...

[TABLE="width: 500, align: left"]
<tbody>[TR]
[TD]CUSTOMER
[/TD]
[TD]Sales
[/TD]
[/TR]
[TR]
[TD]Walmart
[/TD]
[TD]10
[/TD]
[/TR]
[TR]
[TD]Walmart USA
[/TD]
[TD]20
[/TD]
[/TR]
[TR]
[TD]Walmart, NY
[/TD]
[TD]30
[/TD]
[/TR]
[TR]
[TD]USA Walmart
[/TD]
[TD]15
[/TD]
[/TR]
[TR]
[TD]NY, Walmart, USA
[/TD]
[TD]40
[/TD]
[/TR]
[TR]
[TD]USAWalmart
[/TD]
[TD]10
[/TD]
[/TR]
[TR]
[TD]Nike
[/TD]
[TD]90
[/TD]
[/TR]
[TR]
[TD]USA Nike
[/TD]
[TD]20
[/TD]
[/TR]
[TR]
[TD]Nike, CA
[/TD]
[TD]15
[/TD]
[/TR]
[TR]
[TD]Nike, USA, CA
[/TD]
[TD]10
[/TD]
[/TR]
[TR]
[TD]USA, Nike
[/TD]
[TD]50
[/TD]
[/TR]
[TR]
[TD]Nikee USA
[/TD]
[TD]20
[/TD]
[/TR]
[TR]
[TD]Adidas
[/TD]
[TD]10
[/TD]
[/TR]
[TR]
[TD]Canada Adidas
[/TD]
[TD]20
[/TD]
[/TR]
[TR]
[TD]Adidas USA
[/TD]
[TD]50
[/TD]
[/TR]
</tbody>[/TABLE]

JoeMo · Aug 16, 2018

To prevent a highlighted mess, you could create a new column say before the Customer column then using the first 5 of each cell in the customer list assign a number in that new column and then when the list is complete use that column to sort the whole table so like customer names are grouped. Is that something you could work with? Also, if the search can be confined to just the customer column, that would speed things up a bit.

I have to go offline shortly, but if you reply I will look at this further, if no one else has helped you, when I return.

excelstarter1 · Aug 16, 2018

No worries, thanks for your quick replies!! A big help!

Yes the search is only confined to just the customer column. The numbers/sales can be ignored for the moment.

Well the idea with the first 5 characters was a loop and to start with the first 5 (e.g. Walma), then skip one character and use the next 5 characters (e.g. almar) and so on. I have not figured out the best approach. In the end I want all the customers that have "Walmart" in their name or Nike, Adidas etc. to be in one group.

JoeMo · Aug 16, 2018

After looking at the sample data you posted, I think using the first x characters will not work to achieve what you want. Are there few enough customers that you could provide a list of them using only the key part, not all of them would have to appear in any particular listing you receive? For example, for the customer list you posted the key list would be: Adidas, Nike, Walmart.

If such a list was available it would be possible to sort the data by company like the customer list in post #5 is arranged.

An alternative would be to have the user input a specific customer or customers and the macro would group all entries for that customer or customers starting at the top of the data range.

VBA - Sophisticated approach to find and highlight duplicates

excelstarter1

Board Regular

JoeMo

MrExcel MVP

excelstarter1

Board Regular

JoeMo

MrExcel MVP

excelstarter1

Board Regular

JoeMo

MrExcel MVP

excelstarter1

Board Regular

JoeMo

MrExcel MVP

Similar threads

Share this page

VBA - Sophisticated approach to find and highlight duplicates

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock