Searching a list of text strings against another list

newscientist_au

New Member
Joined
Feb 6, 2014
Messages
10
Hi
I have data in following format: two columns A and B containing only text.
A is a short list comprising just a few hundred text strings.
B is much longer list containing several thousand rows and has text strings that are extended versions of text in column A.

each value in Column A could have potentially several extended versions in column B. Is there any way I could mark in column C if corresponding value in B has a matching substring from A (the non-extended core)?

[TABLE="width: 500"]
<tbody>[TR]
[TD]A[/TD]
[TD]B[/TD]
[TD]Comments (when Matched with AITRNPVFPR)[/TD]
[/TR]
[TR]
[TD]AITRNPVFPR[/TD]
[TD][TABLE="width: 136"]
<tbody>[TR]
[TD]AITRNPVFPR[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD][TABLE="width: 65"]
<tbody>[TR]
[TD]exact match[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
[TR]
[TD]DEGGFTRS[/TD]
[TD]AITRNPVFPRFRD[/TD]
[TD][TABLE="width: 65"]
<tbody>[TR]
[TD="width: 65"][/TD]
[/TR]
[TR]
[TD]extended on right[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]LLLAITRNPVFPR[/TD]
[TD]extended on left[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]LLAITRNPVFPRDDD[/TD]
[TD]extended on both ends[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]AITRNDDDPVFPR[/TD]
[TD]NO match[/TD]
[/TR]
</tbody>[/TABLE]

If I try to search for each value in column A with wildcard (*) on either end, it gives me all the correct values but I cannot do this for hundreds of sequences!
I tried using INDEX/MATCH looking up each value in B but I could only find exact text and miss most cases (as the search string is longer than value in column A).

If you can point me in right direction it would greatly appreciate it!
(I am using excel 2011 on mac but have access to windows based excel 2010 as well.)
Cheers
Sri
 

Excel Facts

What did Pito Salas invent?
Pito Salas, working for Lotus, popularized what would become to be pivot tables. It was released as Lotus Improv in 1989.
Hi,

I can offer you a VBA solution. You will need to go into the VB Editor and select Insert-->Module from the menu and paste this code in.

You will also need to have two worksheets, which I called Sheet1 and Sheet2. Sheet1 has the short forms in column A and the extended versions are in column B.

The macro reads in both columns and compares everything in column A with everything in column B. I don't know how much data you have but that could take some time. I have used arrays to help with the speed.

The results are written into Sheet2 which it clears first to make sure it is empty.

Code:
Sub FindString()

    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim iRow1 As Long
    Dim iRow2 As Long
    Dim iOut As Long
    Dim lr1 As Long
    Dim lr2 As Long
    Dim arr1 As Variant
    Dim arr2 As Variant
    Dim strResult As String
    
    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    Set ws2 = ThisWorkbook.Worksheets("Sheet2")
    
    With ws1
        lr1 = .Cells(.Rows.Count, "A").End(xlUp).Row
        lr2 = .Cells(.Rows.Count, "B").End(xlUp).Row
        arr1 = .Range("A1").Resize(lr1)
        arr2 = .Range("B1").Resize(lr2)
    End With
    
    With ws2
        .Cells.Clear
        iOut = 0
        For iRow1 = 1 To lr1
            For iRow2 = 1 To lr2
                If arr1(iRow1, 1) = arr2(iRow2, 1) Then
                    strResult = "exact match"
                ElseIf arr1(iRow1, 1) = Left(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on right"
                ElseIf arr1(iRow1, 1) = Right(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on left"
                ElseIf InStr(arr2(iRow2, 1), arr1(iRow1, 1)) Then
                    strResult = "extended on both ends"
                Else
                    strResult = "NO Match"
                End If
                .Range("A1:C1").Offset(iOut) = Array(arr1(iRow1, 1), arr2(iRow2, 1), strResult)
                iOut = iOut + 1
            Next
        Next
        .Columns("A:C").EntireColumn.AutoFit
    End With

End Sub
 
Upvote 0
Hi Rick
Thanks very much for your help.
The script works a charm. It seems to be doing the right thing but somehow I think it is not cycling through all of the sequences in column A.
It gives me a run-time error '1004'
method 'offset' of object 'range' failed.


when i hit the debug button it highlights this
.Range("A1:C1").Offset(iOut) = Array(arr1(iRow1, 1), arr2(iRow2, 1), strResult)


Am I doing something silly and wrong?
Thanks again, i don't know much about VBA, google has been helping me understand the terminology!
 
Upvote 0
Hi Rick
I just noticed that I might be reaching the Row limit of Excel 2011 as i have 14000 rows in Column B and 300 in Column A.
You think this is causing the above mentioned error?

Do you think it is possible to do the matching and discard all the 'no match' entries which make up majority of the entries? This will keep the list small and hopefully excel will like it!
Ta
 
Upvote 0
Hi,

How about if I just remove the "NO Match" cases? That is easy to do.

Another option might be to have the small list displayed across the columns and the long list down the rows.

Code:
Sub FindString()

    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim iRow1 As Long
    Dim iRow2 As Long
    Dim iOut As Long
    Dim lr1 As Long
    Dim lr2 As Long
    Dim arr1 As Variant
    Dim arr2 As Variant
    Dim strResult As String

    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    Set ws2 = ThisWorkbook.Worksheets("Sheet2")
    
    With ws1
        lr1 = .Cells(.Rows.Count, "A").End(xlUp).Row
        lr2 = .Cells(.Rows.Count, "B").End(xlUp).Row
        arr1 = .Range("A1").Resize(lr1)
        arr2 = .Range("B1").Resize(lr2)
    End With
    
    With ws2
        .Cells.Clear
        iOut = 0
        For iRow1 = 1 To lr1
            For iRow2 = 1 To lr2
                If arr1(iRow1, 1) = arr2(iRow2, 1) Then
                    strResult = "exact match"
                ElseIf arr1(iRow1, 1) = Left(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on right"
                ElseIf arr1(iRow1, 1) = Right(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on left"
                ElseIf InStr(arr2(iRow2, 1), arr1(iRow1, 1)) Then
                    strResult = "extended on both ends"
                Else
                    strResult = "NO Match"
                End If
                If strResult <> "NO Match" Then .Range("A1:C1").Offset(iOut) = Array(arr1(iRow1, 1), arr2(iRow2, 1), strResult)
                iOut = iOut + 1
            Next
        Next
        .Columns("A:C").EntireColumn.AutoFit
    End With

End Sub
 
Upvote 0
HI Rick
Thanks again for the effort.
I tried this new version but I run into the same issue by maxing out the row numbers. It somehow seems to ignore the If strResult <> "NO Match".
In Debug the iOut still counts over 1million.
Am I doing something wrong?
Is it possible to iterate but only place text in when there is a match?
Thanks
 
Upvote 0
Hi,

Sorry, that's my fault. I removed the line but continued with the row count regardless. This should be better:

Code:
Sub FindString()

    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim iRow1 As Long
    Dim iRow2 As Long
    Dim iOut As Long
    Dim lr1 As Long
    Dim lr2 As Long
    Dim arr1 As Variant
    Dim arr2 As Variant
    Dim strResult As String

    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    Set ws2 = ThisWorkbook.Worksheets("Sheet2")
    
    With ws1
        lr1 = .Cells(.Rows.Count, "A").End(xlUp).Row
        lr2 = .Cells(.Rows.Count, "B").End(xlUp).Row
        arr1 = .Range("A1").Resize(lr1)
        arr2 = .Range("B1").Resize(lr2)
    End With
    
    With ws2
        .Cells.Clear
        iOut = 0
        For iRow1 = 1 To lr1
            For iRow2 = 1 To lr2
                If arr1(iRow1, 1) = arr2(iRow2, 1) Then
                    strResult = "exact match"
                ElseIf arr1(iRow1, 1) = Left(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on right"
                ElseIf arr1(iRow1, 1) = Right(arr2(iRow2, 1), Len(arr1(iRow1, 1))) Then
                    strResult = "extended on left"
                ElseIf InStr(arr2(iRow2, 1), arr1(iRow1, 1)) Then
                    strResult = "extended on both ends"
                Else
                    strResult = "NO Match"
                End If
                If strResult <> "NO Match" Then
                    .Range("A1:C1").Offset(iOut) = Array(arr1(iRow1, 1), arr2(iRow2, 1), strResult)
                    iOut = iOut + 1
                End If
            Next
        Next
        .Columns("A:C").EntireColumn.AutoFit
    End With

End Sub
 
Upvote 0
Hi,

I am pleased we got it working eventually.

Thanks for letting me know.

Regards,
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,915
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top