Compare Dataset Counts Using Multiple Parameters

Huey72

New Member
Joined
Nov 6, 2019
Messages
32
Hi everyone, I'm migrating data from a legacy source information system to a new information system and want to confirm a variety of counts of values from the source data against the counts of the same values from the migrated data to ensure the counts match as a way to verify the records migrated properly. The datasets have millions of records and multiple columns that I want to count and compare results to ensure they match.

As one example, I have a column for Organization and a column for Product in both datasets - the dataset columns are not the same i.e., the Organization column is column F in the source data and column C in the migrated data and Product is column M in the source data and column D in the migrated data. For each unique organization in the Organization column, I want to count each record for each unique Product value in the Product column in each data set and then compare the two counts and indicate where the test passes and where it fails for each unique combination. The comparison needs to work in both directions i.e., source data to migrated data and migrated data to source data, to ensure complete inclusion of any variances.

I've thought about using pivot tables or advanced filters, but I don't think they will be efficient given it's working with two datasets with different structures and needing to filter, compare, and return results on multiple criteria - the product count is just one example, we will look at other columns of data too e.g., count invoices column by organization, sum sales column by organization, etc. so needs to be able to handle a number of columns to test in a dynamic way, and given the number of records, it needs to perform reasonably fast. I was wondering about using dictionaries and if nesting dictionaries and performing counts/sums on the dictionary keys and items might be the fastest/most efficient, but I'm completely new to dictionaries and am struggling to really know if that's a good approach, or if there's a better way?

Here's a basic example of Source Data, Migrated Data, and the desired Result output. I've put in on one sheet for illustration, but for clarity, the source data and migrated data are separate sheets with different column structure.

Any help is greatly appreciated!

Sample.xlsx
ABCDEFGHIJKL
1Source DataMigrated DataResult
2Col FCol MCol CCol D
3
4OrganizationProductOrganizationProductOrganizationProductLegacy CountMigrated CountResultVariance
5Org AProduct AOrg AProduct AOrg AProduct A33Pass-
6Org AProduct AOrg AProduct AOrg AProduct B22Pass-
7Org AProduct AOrg AProduct AOrg BProduct A22Pass-
8Org AProduct BOrg AProduct BOrg BProduct B10Fail-1
9Org AProduct BOrg AProduct BOrg BProduct C01Fail1
10Org BProduct AOrg BProduct AOrg CProduct A20Fail-2
11Org BProduct AOrg BProduct A
12Org BProduct BOrg BProduct C
13Org CProduct A
14Org CProduct A
Example
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.
then for million of rows
Power query would be great help.
in there you can do all sort of comparisons, mergings cleaning etc.

you can also group by to verify if totals correspond. if they do - ur job is done.
if they don't then you must do more comprehensive analytical work.

I'm not sure what else to advise you since this will be long, complcated task.

I would advise start learning Power query from Mike Girvin and MR excel.
 
Upvote 0

Forum statistics

Threads
1,224,814
Messages
6,181,120
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top