Faster, more efficent ways to lookup against large data sets

daaanj

New Member
Joined
Apr 25, 2013
Messages
13
All I'm trying to achieve here is to look at 2 data extracts (this weeks against last weeks) and highlight any deltas...

SO, what is on this weeks file that wasn't on last weeks - As they'll need creating/loading
and, what's on last weeks file that isn't on this weeks - As they'll need closing/de-activating

I've created a macro that first opens both sets of data and then correctly formats them ready for loading. I then use a loop to highlight the deltas as described above. I'll then just need to write the remaining of the code to drop the deltas onto an import file.

The problem is that my data sets are both over 100,000 rows :eeek: As you can imagine... it takes way too long!

Is there a quicker/more efficient method?

I've tried using formulas instead of the loop (e.g. insert a vlookup instead) but that obviously takes too long to calculate for a macro (doesn't finish calculating by the time the data wants to be moved)

The data I'm working with looks like this (all in column A, before my macro formats it and there's over 100K rows on both files):

[TABLE="width: 183"]
<tbody>[TR]
[TD]10A846|B0000101|B0000606
[/TD]
[/TR]
[TR]
[TD]10A846|B0001100|B0000606
[/TD]
[/TR]
[TR]
[TD]10A846|B0001109|B0000606
[/TD]
[/TR]
[TR]
[TD]10A846|B0001200|B0000606
[/TD]
[/TR]
[TR]
[TD]10A846|B0001209|B0000606
[/TD]
[/TR]
</tbody>[/TABLE]


The part of the code that's slow is this (full code below that)

Code:
Dim searchvalue
For i = 1 To lastrow 'start in row 1 to last row
        Set searchvalue = NewData.Range("B" & i) 'what to look for
        With Olddata.Range("B1:B" & lastrow2) 'range to look in
            Set c = .Find(searchvalue, LookIn:=xlValues) 'Find what im looking for in the range im looking in
            If Not c Is Nothing Then 'if found
                NewData.Cells(i, "D") = "Found"
                Else 'if not found
                NewData.Cells(i, "D") = "Not Found"
                    GoTo NEXTSEARCH1
            End If
        End With
NEXTSEARCH1:
    Next I

The full code is:

Code:
Sub Macro1()

Dim lastrow As Long, lastrow2 As Long
Dim Toolbook As Workbook
Dim newbook As Workbook
Dim Oldbook As Workbook
Dim customerFilename As String
Dim filter As String
Set Toolbook = Application.ActiveWorkbook

Application.ScreenUpdating = False

'Open new data file and name 'newbook'
MsgBox "Please new data File"
     filter = "Text files (*.csv*),*.csv*"
    Caption = "File Select"
    customerFilename = Application.GetOpenFilename(filter, , Caption)
    Application.Workbooks.Open (customerFilename)
Set newbook = Application.ActiveWorkbook
'Open old data file and name 'oldbook'
MsgBox "Select previous data File"
     filter = "Text files (*.csv*),*.csv*"
    Caption = "File Select"
    customerFilename = Application.GetOpenFilename(filter, , Caption)
    Application.Workbooks.Open (customerFilename)
    Set Oldbook = Application.ActiveWorkbook
    
' Set Sheet names
    Dim Olddata As Worksheet
    Set Olddata = Oldbook.Worksheets(1)
    Dim NewData As Worksheet
    Set NewData = newbook.Worksheets(1)
 
'row counts
    lastrow = NewData.Cells(Rows.Count, 1).End(xlUp).Row
    lastrow2 = Olddata.Cells(Rows.Count, 1).End(xlUp).Row
 
NewData.Range("B1", "B" & lastrow).Value = "=Left(A1, 6)&Mid(A1, 8, 8)"
NewData.Range("C1", "C" & lastrow).Value = "=IF(ISNUMBER(LEFT(B1,1)*1)=TRUE,""1350-Project (CPA)"",""1350-Capital (CPA)"")"

Olddata.Range("B1", "B" & lastrow2).Value = "=Left(A1, 6)&Mid(A1, 8, 8)"
Olddata.Range("C1", "C" & lastrow2).Value = "=IF(ISNUMBER(LEFT(B1,1)*1)=TRUE,""1350-Project (CPA)"",""1350-Capital (CPA)"")"

'Check for deltas
Dim searchvalue
For i = 1 To lastrow 'start in row 1 to last row
        Set searchvalue = NewData.Range("B" & i) 'what to look for
        With Olddata.Range("B1:B" & lastrow2) 'range to look in
            Set c = .Find(searchvalue, LookIn:=xlValues) 'Find what im looking for in the range im looking in
            If Not c Is Nothing Then 'if found
                NewData.Cells(i, "D") = "Found"
                Else 'if not found
                NewData.Cells(i, "D") = "Not Found"
                    GoTo NEXTSEARCH1
            End If
        End With
NEXTSEARCH1:
    Next i
  
Application.ScreenUpdating = True
End Sub
Any help/suggestion really would be appreciated

Thanks
 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
The reason your code runs so slow is that you are reading/writing/finding directly from the worksheet. That's one of the slowest things you can do in VBA. Ideally, you should read all your data at one time, process it from internal arrays, then write it back in one shot. For example, with this layout:

ABCD
ab
bc
cd
df
eg
fi
gj
hk
il
jm

<colgroup><col style="width: 25pxpx"><col><col><col><col></colgroup><thead>
</thead><tbody>
[TD="align: center"]1[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]2[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]3[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]4[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]5[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]6[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]7[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]8[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]9[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]10[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

</tbody>
Sheet1



and this code:

Code:
Sub test1()
Dim OldData As Variant, NewData As Variant, MyDict As Object, OutTab As Variant, i As Long

    Set MyDict = CreateObject("Scripting.Dictionary")
    OldData = Range("A1:A10").Value
    NewData = Range("D1:D10").Value
    OutTab = Range("B1:B10").Value
    
    On Error Resume Next            ' Prevents an error in case of duplicate key
    For i = 1 To UBound(NewData)
        MyDict.Add CStr(NewData(i, 1)), i
    Next i
    On Error GoTo 0
    
    For i = 1 To UBound(OldData)
        If MyDict.exists(CStr(OldData(i, 1))) Then
            OutTab(i, 1) = "Found"
        Else
            OutTab(i, 1) = "Not Found"
        End If
    Next i
    
    Range("B1:B10").Value = OutTab


End Sub
will generate this:

ABCD
aNot Foundb
bFoundc
cFoundd
dFoundf
eNot Foundg
fFoundi
gFoundj
hNot Foundk
iFoundl
jFoundm

<colgroup><col style="width: 25pxpx"><col><col><col><col></colgroup><thead>
</thead><tbody>
[TD="align: center"]1[/TD]

[TD="align: right"][/TD]

[TD="align: center"]2[/TD]

[TD="align: right"][/TD]

[TD="align: center"]3[/TD]

[TD="align: right"][/TD]

[TD="align: center"]4[/TD]

[TD="align: right"][/TD]

[TD="align: center"]5[/TD]

[TD="align: right"][/TD]

[TD="align: center"]6[/TD]

[TD="align: right"][/TD]

[TD="align: center"]7[/TD]

[TD="align: right"][/TD]

[TD="align: center"]8[/TD]

[TD="align: right"][/TD]

[TD="align: center"]9[/TD]

[TD="align: right"][/TD]

[TD="align: center"]10[/TD]

[TD="align: right"][/TD]

</tbody>
Sheet1



I also used a Dictionary object so that you could just use .Exists instead of reading through the entire range each time.

Let me know if you need help adapting this to your workbook.
 
Upvote 0
OMG! that's so fast!!! Thank you so much.

FYI - it processed 160,000 rows in about 10 seconds!.... my previous code did it in about 20 minutes :laugh:

Just one question... I've done it so that it tells me which rows on the olddata appear on the newdata... but I also need to do the reverse. I thought I'd just try pasting the code in twice, but flipping it on the second entry (mydicst.add and outtab becomes olddata and mydict exists becomes newdata)... I assume I needed to first clear the dictionary though first? (and if so, how?)

Alternatively, is there a more sensible way to do the below?

Code:
Dim Oldrange As Variant, Newrange As Variant, MyDict As Object, OutTab As Variant, i As Long
 
'Check old data against new
    Set MyDict = CreateObject("Scripting.Dictionary")
    Oldrange = OldData.Range("B1", "B" & lastrow2).Value
    Newrange = NewData.Range("B1", "B" & lastrow).Value
    OutTab = OldData.Range("D1", "D" & lastrow2).Value
    
    On Error Resume Next            ' Prevents an error in case of duplicate key
    For i = 1 To UBound(Newrange)
        MyDict.Add CStr(Newrange(i, 1)), i
    Next i
    On Error GoTo 0
    
    For i = 1 To UBound(Oldrange)
        If MyDict.exists(CStr(Oldrange(i, 1))) Then
            OutTab(i, 1) = "Found"
        Else
            OutTab(i, 1) = "Not Found"
        End If
    Next i
    
    OldData.Range("D1", "D" & lastrow2).Value = OutTab
    
'Check new data against old
    OutTab = NewData.Range("D1", "D" & lastrow).Value
    
    On Error Resume Next            ' Prevents an error in case of duplicate key
    For i = 1 To UBound(Oldrange)
        MyDict.Add CStr(Oldrange(i, 1)), i
    Next i
    On Error GoTo 0
    
    For i = 1 To UBound(Newrange)
        If MyDict.exists(CStr(Newrange(i, 1))) Then
            OutTab(i, 1) = "Found"
        Else
            OutTab(i, 1) = "Not Found"
        End If
    Next i
    
    NewData.Range("D1", "D" & lastrow).Value = OutTab

Thanks again
Dan
 
Upvote 0
FYI - it processed 160,000 rows in about 10 seconds!.... my previous code did it in about 20 minutes :laugh:

I think that counts as a fair improvement! :-D

I don't see any major improvements to your code. You can prevent the "OutTab = " line if you want. You can probably see that it doesn't really accomplish much beyond defining the range of the right size, each entry in that array is overwritten later.

Code:
    Dim OutData() as Variant
    Redim OutData(1 To lastrow2, 1 To 1)

But 1 read doesn't take long, and it's only 1 line instead of 2. Matter of preference I suppose. And to clear the dictionary, you can do either of the following:

Code:
    Set MyDict = Nothing
    Set MyDict = CreateObject("Scripting.Dictionary")
or

Code:
    MyDict.RemoveAll
Again, just depends on the situation. Might be a performance difference, but I've never tested it, and it would probably be minor.

I'm glad it worked for you! :cool:
 
Upvote 0

Forum statistics

Threads
1,223,246
Messages
6,170,988
Members
452,373
Latest member
TimReeks

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top