Compressing large amount of rows but adding totals?

Stildawn

Board Regular
Joined
Aug 26, 2012
Messages
200
Hi All

I'm working with a large amount of data, the largest set I've seen in real life so far was over 28k rows.

What I need to do is get it down to a compressed amount of rows based on essentially removing duplicates in a few columns, but counting certain column values into a total.

I have created this test file: VBA Test Data.xlsx

In it we have sheet "Original Data" this is essentially the raw data that I have, however its typically around 8k to 28k rows.

The sheet "Converted Data" is what it needs to compress down to.

Basically I need to do the below:

  • Remove duplicate rows based on columns A / B / C / F
  • However the remaining non duplicate row needs to have a count of all removed duplicates from column E & G
  • End up with totals from column E & G for each unique invoice / product / code etc (like example in sheet "Converted Data"
  • The original data can be in any order, and is often not in a nice grouped order like the example spreadsheet.
This all needs to be done in VBA as part of a bunch of other code to get a final product.

So basically just after your thoughts on how best to tackle this in the most efficient manner, as with the amount of data it can take a while to process etc.

Thanks
 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
Hello! Are you satisfied with this result?
VBA Test Data.xlsx
ABCDEFG
1Invoice #ProductCodeDescriptionQTYCOOPrice
2INV123ABC1236109.10.Product 127CN85,59
3INV123XYZ9876201.20.Product 923KH73,6
4INV798EFG2344202.10.Product 224TW48,25
5INV798LMN5676307.10.Product 526AU88,66
Table1

Made with Power Qwery.
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Invoice #", type text}, {"Product", type text}, {"Code", type text}, {"Description", type text}, {"QTY", Int64.Type}, {"COO", type text}, {"Price", type number}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Invoice #", "Product", "Code", "Description", "COO"}, {{"QTY", each List.Sum([QTY]), type nullable number}, {"Price", each List.Sum([Price]), type nullable number}}),
    #"Reordered Columns" = Table.ReorderColumns(#"Grouped Rows",{"Invoice #", "Product", "Code", "Description", "QTY", "COO", "Price"})
in
    #"Reordered Columns"
 
Upvote 0
Create a pivot table from your data:
  1. Format your range as a table (control+t)
  2. Click "Summarize with Pivot table"
  3. Choose where to place the PT
  4. Drag these fields to the Rows area: Invoice # Product Code Description COO
  5. Drag the QTY and Price fields to the Value area.
  6. Done.
I know the layout is slightly different but the result is identical.
 
Upvote 0
Create a pivot table from your data:
  1. Format your range as a table (control+t)
  2. Click "Summarize with Pivot table"
  3. Choose where to place the PT
  4. Drag these fields to the Rows area: Invoice # Product Code Description COO
  5. Drag the QTY and Price fields to the Value area.
  6. Done.
I know the layout is slightly different but the result is identical.
I need to do it all in VBA.
 
Upvote 0
For a vba approach, try this with a copy of your workbook. I have assumed the worksheet with the original data is active and the results are written to columns I:O

VBA Code:
Sub Consolidate_Data()
  Dim dQ As Object, dP As Object
  Dim a As Variant
  Dim i As Long
  Dim s As String
 
  Set dQ = CreateObject("Scripting.Dictionary")
  Set dP = CreateObject("Scripting.Dictionary")
 
  With Range("A2", Range("G" & Rows.Count).End(xlUp))
    a = .Value
    For i = 1 To UBound(a)
      s = Join(Application.Index(a, i, Array(1, 2, 3, 4)), ";") & ";;" & a(i, 6)
      dQ(s) = dQ(s) + a(i, 5)
      dP(s) = dP(s) + a(i, 7)
    Next i
    .Rows(0).Copy Destination:=.Cells(0, 9)
    With .Offset(, 8).Resize(dQ.Count, 1)
      .Value = Application.Transpose(dQ.Keys)
      .TextToColumns DataType:=xlDelimited, Semicolon:=True, Comma:=False, Space:=False, Other:=False
      .Offset(, 4).Value = Application.Transpose(dQ.items)
      .Offset(, 6).Value = Application.Transpose(dP.items)
    End With
  End With
End Sub

My sample data and code results:

Stildawn VBA Test Data.xlsm
ABCDEFGHIJKLMNO
1Invoice #ProductCodeDescriptionQTYCOOPriceInvoice #ProductCodeDescriptionQTYCOOPrice
2INV123ABC1236109.10.Product 14CN12.68INV123ABC1236109.10.Product 127CN85.59
3INV123ABC1236109.10.Product 16CN19.02INV123XYZ9876201.20.Product 923KH73.6
4INV123ABC1236109.10.Product 16CN19.02INV798EFG2344202.10.Product 224TW48.25
5INV123ABC1236109.10.Product 14CN12.68INV798LMN5676307.10.Product 526AU88.66
6INV123ABC1236109.10.Product 13CN9.51
7INV123ABC1236109.10.Product 12CN6.34
8INV123ABC1236109.10.Product 12CN6.34
9INV123XYZ9876201.20.Product 94KH12.8
10INV123XYZ9876201.20.Product 96KH19.2
11INV123XYZ9876201.20.Product 96KH19.2
12INV123XYZ9876201.20.Product 94KH12.8
13INV123XYZ9876201.20.Product 93KH9.6
14INV798EFG2344202.10.Product 25TW5.6
15INV798EFG2344202.10.Product 24TW12.9
16INV798EFG2344202.10.Product 25TW15.6
17INV798EFG2344202.10.Product 24TW10.6
18INV798EFG2344202.10.Product 26TW3.55
19INV798LMN5676307.10.Product 54AU15.65
20INV798LMN5676307.10.Product 57AU12.85
21INV798LMN5676307.10.Product 53AU1.5
22INV798LMN5676307.10.Product 52AU6.5
23INV798LMN5676307.10.Product 51AU18.95
24INV798LMN5676307.10.Product 54AU14.66
25INV798LMN5676307.10.Product 55AU18.55
Original Data
 
Upvote 0
What about this (needs a reference to the ADO library):

VBA Code:
Sub SummarizeData()
    Dim sql As String
    Dim cn As ADODB.Connection
    Dim Rst As ADODB.Recordset
    Dim wb As String
    Const sheetName As String = "Original_Data"
    wb = ThisWorkbook.Path & "\" & ThisWorkbook.Name
    'Invoice #   Product Code    Description QTY COO Price
    sql = "SELECT [" & sheetName & "$].[Invoice #], [" & sheetName & "$].Product, [" & sheetName & "$].Code, [" & sheetName & "$].Description, Sum([" & sheetName & "$].QTY) AS SumOfQTY, [" & sheetName & "$].COO, Sum([" & sheetName & "$].Price) AS SumOfPrice"
    sql = sql & " FROM [" & sheetName & "$]"
    sql = sql & " GROUP BY [" & sheetName & "$].[Invoice #], [" & sheetName & "$].Product, [" & sheetName & "$].Code, [" & sheetName & "$].Description, [" & sheetName & "$].COO;"

    Set cn = New ADODB.Connection

    '--- Connection ---
    '    With cn
    '        .Provider = "Microsoft.Jet.OLEDB.4.0"
    '        .ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" _
             '            & sheetName & ";Extended Properties=""Excel 12.0;HDR=YES;"""
    '        .Open
    '    End With
    Set cn = CreateObject("ADODB.Connection")
    With cn
        .Provider = "Microsoft.ACE.OLEDB.12.0"
        .ConnectionString = "Data Source=" & wb & ";" & "Extended Properties=""Excel 12.0 Xml;HDR=YES"";"
        .Open
    End With


    Set Rst = New ADODB.Recordset
    Set Rst = cn.Execute(sql)
    Worksheets("Converted Data").Range("A10").CopyFromRecordset Rst

End Sub
 
Upvote 0

Forum statistics

Threads
1,224,737
Messages
6,180,660
Members
452,992
Latest member
TokugawaIesuma

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top