Hi everyone, I am migrating data between information systems, and want to compare the source dataset from the legacy system to the migrated dataset in the new system to ensure they match.
To do this, I've imported both datasets into the same workbook, into their respective worksheets, named Source and Migrated. The dataset in both sheets is large: 300 columns x 600,000 rows.
For smaller datasets, I have a macros which will loop through each cell in the data and return a list of mismatch results e.g., where values don't match, the result list would return a list containing values like: Cell: D75 Source value: 1 <> Migrated value: 2, etc.
The problem is to loop through each cell on a dataset this size takes a really long time. I looked at using Access, and it has a 255 column limit that I prefer not to work around. I'm not familiar with PowerQuery enough to do this in a way that will return the clear result. I think this can be done using the scripting dictionary in a much faster way than looping through cells, but I can't get the syntax right.
The following code is sort of doing what I want it to do, but it's currently returning the complete record from each dataset that doesn't match, rather than only the specific value. Rather than returning two records for the mismatch, I'd prefer to have one record for each specific difference e.g., Cell: D75 Source value: 1 <> Migrated value: 2, Cell: JK 525,431 Source value: A <> Migrated value: B, etc.. To do this, I think the code may need to be changed to nest the second dictionary within the first, to then iterate through the items of each dictionary, compare and return the mismatch.
I'm new to using the scripting dictionary and would like to learn this - any guidance on this would be greatly appreciated. Thanks!
To do this, I've imported both datasets into the same workbook, into their respective worksheets, named Source and Migrated. The dataset in both sheets is large: 300 columns x 600,000 rows.
For smaller datasets, I have a macros which will loop through each cell in the data and return a list of mismatch results e.g., where values don't match, the result list would return a list containing values like: Cell: D75 Source value: 1 <> Migrated value: 2, etc.
The problem is to loop through each cell on a dataset this size takes a really long time. I looked at using Access, and it has a 255 column limit that I prefer not to work around. I'm not familiar with PowerQuery enough to do this in a way that will return the clear result. I think this can be done using the scripting dictionary in a much faster way than looping through cells, but I can't get the syntax right.
The following code is sort of doing what I want it to do, but it's currently returning the complete record from each dataset that doesn't match, rather than only the specific value. Rather than returning two records for the mismatch, I'd prefer to have one record for each specific difference e.g., Cell: D75 Source value: 1 <> Migrated value: 2, Cell: JK 525,431 Source value: A <> Migrated value: B, etc.. To do this, I think the code may need to be changed to nest the second dictionary within the first, to then iterate through the items of each dictionary, compare and return the mismatch.
I'm new to using the scripting dictionary and would like to learn this - any guidance on this would be greatly appreciated. Thanks!
VBA Code:
Sub CompareData()
Dim vstr As String
Dim vData As Variant
Dim vitm As Variant
Dim vArr As Variant
Dim v()
Dim currentSht As String
Dim shtResults As String
Dim a As Long
Dim b As Long
Dim c As Long
Application.ScreenUpdating = False
shtResults = "Results"
'If Sheet doesn't exist, create it
If SheetExists(shtResults) = True Then
Application.DisplayAlerts = False
Sheets(shtResults).Delete
Application.DisplayAlerts = True
Sheets.Add
ActiveSheet.Name = shtResults
Else
Sheets.Add
ActiveSheet.Name = shtResults
End If
vData = Sheets("Source").Range("A1:M11").Value
currentSht = "Source"
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
ReDim v(1 To UBound(vData, 2))
For a = 2 To UBound(vData, 1)
For b = 1 To UBound(vData, 2)
vstr = vstr & Chr(2) & vData(a, b)
'Debug.Print vstr
v(b) = "[Cell: " & a & ", " & b & " -- " & currentSht & "] " & vData(a, b)
Next
.Item(vstr) = v
vstr = ""
Next
vData = Sheets("Migrated").Range("A1:M11").Value
currentSht = "Migrated"
For a = 2 To UBound(vData, 1)
For b = 1 To UBound(vData, 2)
vstr = vstr & Chr(2) & vData(a, b)
v(b) = "[Cell: " & a & ", " & b & " -- " & currentSht & "] " & vData(a, b)
Next
If .Exists(vstr) Then
.Item(vstr) = Empty
Else
.Item(vstr) = v
End If
vstr = ""
Next
For Each vitm In .Keys
If IsEmpty(.Item(vitm)) Then
.Remove vitm
End If
Next
vArr = .Items
c = .Count
End With
With Sheets(shtResults).Range("A1").Resize(, UBound(vData, 2))
.Cells.Clear
.Value = vData
If c > 0 Then
.Offset(1).Resize(c).Value = Application.Transpose(Application.Transpose(vArr))
End If
End With
Application.ScreenUpdating = True
End Sub