The best way to compare 50,000 rows of data to another of 130,000? Vlookup to slow

Mr Denove

Active Member
Joined
Jun 8, 2007
Messages
446
Morning All,

On a daily basis I am looking to create a report that takes 50,000 circa rows of data and then compares it to another list of 130,000 rows of data. If I use a vlookup I might as well go get lunch as it takes so long to run due to the limitations of the pc memory etc.

Can anyone suggest a quick and efficient way to resolve this please?

Thanks in advance
Stuart
 
Hi Aladin,

My vlookup is : =VLOOKUP(A2,'07092017'!$A$2:$C$130000,3,FALSE)

A2 runs to approx 56,000 rows of data which I am comparing to another data extract ie 07092017. If there is a match then to produce the data in Col C which shows a change in the reporting state.
Its looking into a range of data over 130,000 rows and using just a vlookup is taking too long, 40-50 mins at least.

I am not familiar or experienced enough to understand the Scripting Dictionary method which appears to be a faster method.

Thanks for gettting in touch.
Stuart
 
Upvote 0

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
Hi Aladin,

My vlookup is : =VLOOKUP(A2,'07092017'!$A$2:$C$130000,3,FALSE)

A2 runs to approx 56,000 rows of data which I am comparing to another data extract ie 07092017. If there is a match then to produce the data in Col C which shows a change in the reporting state.
Its looking into a range of data over 130,000 rows and using just a vlookup is taking too long, 40-50 mins at least.

I am not familiar or experienced enough to understand the Scripting Dictionary method which appears to be a faster method.

Thanks for gettting in touch.
Stuart

If you can sort '07092017'!$A$2:$C$130000 on column A in ascending order and keep it that way...

Try:

=IF(VLOOKUP(A2,'07092017'!$A$2:$A$130000,1,1)=$A2,VLOOKUP(A2,'07092017'!$A$2:$C$130000,3,1),"not found")

This should be 10x faster or so.
 
Upvote 0
I've indicated in the code comments where it is printing out to column D, sheet 2. If nothing's printing out, it's suggesting it can't find the values in column A of sheet 2 in the dictionary, which suggests there may be a formatting or data difference between the two columns (e.g. empty spaces?) I've tried to account for this below:
Code:
Sub AVeryConfidentialMacro()
    
    'Variables
    Dim arr()   As Variant
    Dim dic     As Object
    Dim x       As Long
    
    'Set a variable to be a dictionary object
    Set dic = CreateObject("Scripting.Dictionary")
        
    With Sheets("Sheet1")
        'Find last row with data on column A, sheet1
        x = .Cells(.Rows.count, 1).End(xlUp).row
        'Read data from A2 to last row (x as above) in column C into an array
        'The array size must equal the data range size
        arr = .Cells(1, 2).Resize(x - 1, 3).Value
    End With
    
    'Iterate over the array
    For x = LBound(arr, 1) To UBound(arr, 1)
        'Map the key (columm A) to it's item (column C)
        dic(Trim$(arr(x, 1))) = Trim$(arr(x, 3))
    Next x
    'Clear the array for reuse, all key-item maps are in the dictionary
    Erase arr
    
    With Sheets("Sheet2")
        'Find last row with data on column A, sheet2
        x = .Cells(.Rows.count, 1).End(xlUp).row
        'Read the lookup values in column A, into an array
        'The array size must equal the data range size
        arr = .Cells(2, 1).Resize(x - 1).Value
        
        'Iterate over the array and pass the value into the dictionary and if it maps to a key
        'over-write the value back into the array, the array size matches the output size
        'So no need for separate arrays
        For x = LBound(arr, 1) To UBound(arr, 1)
            Trim$(arr(x, 1)) = dic(CStr(arr(x, 1)))
        Next x
        'Print out results in column D of Sheet2
        .Cells(4, 2).Resize(UBound(arr, 1), UBound(arr, 2)).Value = arr
    End With
    
    'Clear variables
    Erase arr
    Set dic = Nothing


End Sub
Although it's assuming the key value is a string.

Can you give an examples of what the look up value and the look up table are?
 
Upvote 0
Id Key Current State New
VALUE-102 Key Code -484464:445

Similar to above with the actual data swapped. I would be populating the Currrent State New with the state reported on the daily extract I would produce.
 
Upvote 0
Hi Aladin, Im happy to try anything that gives me the quickest result. However the formula above produces a not found on the first result even when I copy and paste in the same value from the reference.

Corrected by adding brackets to the second vlookup

=IF(VLOOKUP(A2,'07092017'!A2:A3,1,1)=D2,(VLOOKUP(A2,'07092017'!A2:A3,1,1)))
 
Last edited:
Upvote 0
Thanks Aladin that appears to have worked in a fraction of the time. Curious why tho?

Yes, that is exactly the expectation.

Recall that the formula is:

=IF(VLOOKUP($D2,'07092017'!$A$2:$A$130000,1,1)=$D2,VLOOKUP($D2,'07092017'!$A$2:$C$130000,3,1),"not found")

if D2 is the value to look up. The look up table must be sorted in ascendinding order all the time, that is, when you add new records to the table, it must be re-sorted.

As to why it's so fast lies in the fact that look up formulas with match-type set to 1 (or TRUE) or ones with LOOKUP operate using a sort of binary search algorithm, which need a few steps to find what it looks for. Such formulas require a sorted match range in ascending order. They effect an approximate match.

The formula we now have is looking exact values...

Given the following sorted match-range...

A
B
D
F

[1] VLOOKUP("D",A1:A4,1,1) will land on D immediately.

[2] VLOOKUP("C",A1:A4,1,1) will land on B immediately, that is, the value which is smaller than or equal to C.

We test in our formula whether the value which is found is equal (exact match) to our look up value and accept it only if that's case (otherwise >> not found).

Note. Exact match (match-type set to 0 or FALSE) must compare the look up value with every value in the match-range until it gets a hit or run out of cells to look in. That's why it's slow.
 
Upvote 0
Hi Aladin, finally got logged back into my pc and tested the formula, works like a charm.
Thank you so much for being so helpful.
Stuart
 
Upvote 0

Forum statistics

Threads
1,224,826
Messages
6,181,192
Members
453,021
Latest member
pingpong7117

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top