Comparing 2 excel documents question

Mikelowrey

Board Regular
Joined
Apr 20, 2014
Messages
57
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have 2 excel documents that I want to compare both have the same information, is just one has been updates where things have been removed. One has 80K entries, and the other has 75K, is there a way for me to see what were those items removed in comparison with the first document?

Thank you!
 
In the example I posted above I'm only matching 3 columns (to save space) which is where the 3 comes from in the SUMPRODUCT formula (cell D3). If you have 15 columns in each row then change the 3 to 15.
 
Upvote 0

Excel Facts

Will the fill handle fill 1, 2, 3?
Yes! Type 1 in a cell. Hold down Ctrl while you drag the fill handle.
@ Mikelowrey
1. What Excel version & operating system are you using?

2. Two different documents. Which sheet in each document?
Eg Each document only has 1 sheet, or it is the first sheet in each document, or it is sheet 'Data' in the larger document and sheet 'New Data' in the smaller document etc

BTW
80,000 rows x 15 columns is quite a bit of data to compare. I suspect the formula solutions suggested so far will take an extremely long time to re-calculate or freeze/crash Excel altogether. Wonder if they have been tested on anything approaching that amount of data?
 
Last edited:
Upvote 0
I have to agree with Peter in formulas taking too long. When I tried my formula on 93,000 rows the SUMPRODUCT was still running after 12 min. without an answer.
I then tried adding a helper column to each list where I concatenated the data in a row to one cell and then used the MATCH function on the helper columns. This worked, but it took 6 mins 17 sec. to run.
 
Upvote 0
with 541522 rows and 11 columns
With PowerQuery merge with JoinKindRightAnti (only show rows with any changes in new table) ca. 1 minute

example M code
Code:
[SIZE=1]let
    Source = Table.NestedJoin(old,{"Fiscal Month", "Profit Center", "Account", "Forecast Version", "YTD Flag", "Actual", "Budget", "Forecast", "Actual People", "Budget Position", "Forecast People"},new,{"Fiscal Month", "Profit Center", "Account", "Forecast Version", "YTD Flag", "Actual", "Budget", "Forecast", "Actual People", "Budget Position", "Forecast People"},"new",JoinKind.RightAnti),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"new"}),
    #"Expanded new" = Table.ExpandTableColumn(#"Removed Other Columns", "new", {"Fiscal Month", "Profit Center", "Account", "Forecast Version", "YTD Flag", "Actual", "Budget", "Forecast", "Actual People", "Budget Position", "Forecast People"}, {"new.Fiscal Month", "new.Profit Center", "new.Account", "new.Forecast Version", "new.YTD Flag", "new.Actual", "new.Budget", "new.Forecast", "new.Actual People", "new.Budget Position", "new.Forecast People"})
in
    #"Expanded new"[/SIZE]
 
Last edited:
Upvote 0
of course the time is relative, depends on the computer parameters
I agree & I suspect Power Query may be a good way to go with this problem. However, I have offered an alternative below should the OP wish to consider it.

I then tried adding a helper column to each list where I concatenated the data in a row to one cell and then used the MATCH function on the helper columns.
That is basically what I am doing, though via vba.
With 80,000 rows & 15 columns in the original file and the second file with about 8,000 of the original rows removed, the code below for me takes about 35 seconds to run. Apart from the differences in machines mentioned by sandy666, the running time may also depend on just what sort of data is in the cells. For my testing I have just used random numerical data.

My code adds an additional column in the original file, marks the rows that are missing in the second file and filters the original sheet to only show the deleted rows. That AutoFilter can easily be removed by using the drop-down at the top if that extra column to 'Select All'.

Assumptions with my code:
- Both files to be open when the code is run (& it needs the workbook names edited to match yours).
- The data in each workbook is in the first worksheet of that workbook.
- You are using Excel through Office 365 so have the TEXTJOIN function available.

Rich (BB code):
Sub Missing_Rows()
  Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
  Dim d As Object
  Dim a As Variant, b As Variant
  Dim lc1 As Long, lc2 As Long
 
  Set ws1 = Workbooks("File 1.xlsx").Sheets(1)
  Set ws2 = Workbooks("File 2.xlsx").Sheets(1)
  lc1 = ws1.Cells(1, Columns.Count).End(xlToLeft).Column
  lc2 = ws2.Cells(1, Columns.Count).End(xlToLeft).Column
  If lc1 = lc2 Then
    Application.ScreenUpdating = False
    ws2.Copy After:=ws1
    Set ws3 = ws1.Parent.Sheets(ws1.Index + 1)
    With ws3.Cells(1, lc2 + 1).Resize(ws3.Cells(Rows.Count, 1).End(xlUp).Row)
      .FormulaR1C1 = "=TEXTJOIN(""|"",False,RC1:RC[-1])"
      .Value = .Value
    End With
    With ws1.Cells(1, lc1 + 2).Resize(ws1.Cells(Rows.Count, 1).End(xlUp).Row)
      .FormulaR1C1 = "=TEXTJOIN(""|"",False,RC1:RC[-2])"
      .Value = .Value
    End With
    With ws1.Cells(1, lc1 + 1).Resize(ws1.Cells(Rows.Count, 1).End(xlUp).Row)
      .FormulaR1C1 = Replace(Replace("=IF(ISNUMBER(MATCH(RC[1],'#'!C%,0)),"""",""Del"")", "#", ws3.Name), "%", lc2 + 1)
      .Value = .Value
      .Offset(, 1).ClearContents
      .Cells(1).Value = "Deleted"
      .AutoFilter Field:=1, Criteria1:="Del"
    End With
    Application.DisplayAlerts = False
    ws3.Delete
    Application.DisplayAlerts = True
    Application.ScreenUpdating = True
  Else
    MsgBox "Different number of columns"
  End If
   Debug.Print "Code took " & Format(Timer - t, "0.000 secs")
End Sub
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top