Compare text strings to find mispellings

Tarver

Board Regular
Joined
Nov 15, 2012
Messages
113
Office Version
  1. 365
Platform
  1. Windows
  2. MacOS
I'm working with a large list of partners, and data is input by many parties. Unfortunately, not everyone sicks to the naming nomenclature and human error abounds!

Here's a short snippet of that list ...

[TABLE="width: 256"]
<colgroup><col></colgroup><tbody>[TR]
[TD]Harbor Freight[/TD]
[/TR]
[TR]
[TD]Hard Rock Cafe[/TD]
[/TR]
[TR]
[TD]Harley Davidson[/TD]
[/TR]
[TR]
[TD]Harley-Davidson[/TD]
[/TR]
[TR]
[TD]Harmony[/TD]
[/TR]
[TR]
[TD]Hey Cupcakes[/TD]
[/TR]
[TR]
[TD]Hibbett Sports[/TD]
[/TR]
</tbody>[/TABLE]

You can see that either the entries are very different, or there's an occasional entry with a misspelling or variation (like the hyphen instead of a space in Harley Davidson.) I'm looking to find a way to compare the entries in this list with each other, and where these is a one or two-character difference, to highlight or flag it for review.

EXACT doesn't work, because it tells me only that the items in list list are unique -- and they are. I need to see where two adjacent entries in the alphabetically sorted list are similar, but not the same.

Any thoughts?

You help is most appreciated.
 

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.
The Levenshtein distance tells you how many changes are needed to change one text into another. I wrote a macro to calculate it. To try it, open a copy of your workbook. Press Alt-F11 to open the VBA editor. Press Alt-I-M to add a module. Paste the following code into the window that opens:

Code:
Public Function LevDist(str1 As String, str2 As String)
Dim d() As Long, m As Long, n As Long, i As Long, j As Long, Cost As Long

    m = Len(str1)
    n = Len(str2)
    ReDim d(0 To m, 0 To n)
    
    For i = 1 To m
        d(i, 0) = i
    Next i
    
    For j = 1 To n
        d(0, j) = j
    Next j
    
    For j = 1 To n
        For i = 1 To m
            Cost = IIf(Mid(str1, i, 1) = Mid(str2, j, 1), 0, 1)
            d(i, j) = WorksheetFunction.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + Cost)
        Next i
    Next j
    
    LevDist = d(m, n)
    
End Function
Press Alt-Q to close the editor. Now assuming your list starts in A2, put this formula in B2:

=IF(levdist(A2,A1)<=2,"Check this one","")

And it will flag any values that are 2 changes or less different from the previous value. Case counts. There are several variations of this to try, but see if this is along the lines of what you're looking for.
 
Upvote 0

Forum statistics

Threads
1,225,739
Messages
6,186,743
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top