Grouping Cells

Fox21

New Member
Joined
Jul 4, 2021
Messages
10
Office Version
  1. 365
Platform
  1. Windows
I am trying to group duplicate rows according to Column 3(ID Number). I have used collections as a method, The problem is if there is only two duplicates in column three then only one item shows up in the watch window of the collection.I have posted the result that I want on the right hand side of the excel xl2bb. I am fairly new to vba. I also want to write it out onto the worksheet but has issues.Here is what I have so far:

Sub Duplicate()
Dim coll As New Collection, cls As Class1
Dim arrayDup() As Variant
Dim a As Long, b As Long, rw As Long, cl As Long, s As Long, k As Long

rw = Sheet1.Range("A1").CurrentRegion.Rows.Count
cl = Sheet1.Range("A1").CurrentRegion.Columns.Count
arrayDup = Sheet1.Range("A1").CurrentRegion.Value

For a = 2 To UBound(arrayDup) - 1

For b = a To UBound(arrayDup) - 1

If arrayDup(a, 3) = arrayDup(b + 1, 3) Then

Set cls = New Class1
cls.FullName = arrayDup(b + 1, 1)
cls.LastName = arrayDup(b + 1, 2)
cls.IDNumber = arrayDup(b + 1, 3)

coll.Add cls

End If
Next b
Next a

End Sub

arrange (version 1).xlsb
ABCDEFG
1First NameLastNameIDNumber>>>>>First NameLastNameIDNumber
2Seawest6Seawest6
3Kickereast6Kickereast6
4Policenorth9Shiamoody6
5Dansouth3Dansouth3
6Bannorth west3Bannorth west3
7Shiamoody6Policenorth9
8Kakahhoodie10Kakahhoodie10
Sheet1
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
Can you explain your methodology for the sort order in your mocked up solution. I cannot see a pattern here .
 
Upvote 0
Apologies for any confusion. I want to find duplicates referring to Column C, As soon as I find duplicates I want to sort them accordingly(Like column G including the rows). I tried the sort method and it takes time(17min or so) or excel crashes to rearrange 500k rows. See for example "Shia moody(row7)" on the left hand side, was rearranged and sorted on the right hand side.
 
Upvote 0
Apologies for any confusion. I want to find duplicates referring to Column C, As soon as I find duplicates I want to sort them accordingly(Like column G including the rows). I tried the sort method and it takes time(17min or so) or excel crashes to rearrange 500k rows. See for example "Shia moody(row7)" on the left hand side, was rearranged and sorted on the right hand side.
My Code was an attempt to performing this task. If you can perform this task in fewer steps, it would be of great help. Please ignore my Sample code. My overall idea was to load these records in an array and then search for duplicates by looping within the array. If there is a duplicate found then I want to sort them by adding the duplicates to a collection. Once that is done I want it to write to another worksheet from the collection.
 
Upvote 0
See if this gets you close.
I only outputs those IDs that appear more than once to a new sheet.

VBA Code:
Sub GetDuplicates()
      
    Dim dict As Object
    Dim coll As Collection
    Dim oPeople As clsPeople
    Dim wb As Workbook
    Dim sht As Worksheet
    Dim rng As Range
    Dim i As Long
    Dim dKey As Variant
    Dim outSht As Worksheet
    Dim outRow As Long
   
    Set wb = ThisWorkbook
    Set sht = ActiveSheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set rng = sht.Range("A1").CurrentRegion
   
    For i = 2 To rng.Columns(3).Cells.Count
        Set oPeople = New clsPeople
        oPeople.Firstname = Cells(i, 1).Value
        oPeople.Surname = Cells(i, 2).Value
        oPeople.ID = Cells(i, 3).Value
       
        If dict.exists(Cells(i, 3).Value) Then
            dict(Cells(i, 3).Value).Add oPeople
        Else
            Set coll = New Collection
            coll.Add oPeople
            dict.Add Cells(i, 3).Value, coll
        End If

    Next i
   
    Set outSht = wb.Worksheets.Add(before:=wb.Worksheets(1))
    outRow = 1
    With outSht.Cells(outRow, 1).Resize(1, rng.Rows(1).Cells.Count)
        .Value = rng.Rows(1).Value
        .Font.Bold = True
    End With
   
    For Each dKey In dict
        Set coll = dict(dKey)
        ' only output if duplicates exist
        If coll.Count > 1 Then
            For i = 1 To coll.Count
                outRow = outRow + 1
                Set oPeople = coll(i)
                outSht.Cells(outRow, 1).Value = oPeople.Firstname
                outSht.Cells(outRow, 2).Value = oPeople.Surname
                outSht.Cells(outRow, 3).Value = oPeople.ID
               
            Next i
        End If
   
    Next dKey
   
    outSht.Range("A1").CurrentRegion.EntireColumn.AutoFit

End Sub

The next section is a class object (using the term very loosely) called clsPeople
VBA Code:
Option Explicit

Public Firstname As String
Public Surname As String
Public ID As Long
 
Upvote 0
I just realised that for 500k rows I should have used and array for reading & writing, so this is likely to be too slow.
Let me know how you go and I can revisit it tomorrow if need be.
 
Upvote 0
See if this is any faster.

VBA Code:
Sub GetDuplicatesArrayWIP()
    
    Dim dict As Object
    Dim coll As Collection
    Dim oPeople As clsPeople
    Dim wb As Workbook
    Dim sht As Worksheet
    Dim rng As Range
    Dim i As Long
    Dim dKey As Variant
    Dim outSht As Worksheet
    Dim outRow As Long
    Dim arr As Variant
    Dim outcoll As Collection
    Dim outArr()
    
    Set wb = ThisWorkbook
    Set sht = ActiveSheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set rng = sht.Range("A1").CurrentRegion
    arr = rng
    
    For i = 2 To UBound(arr, 1)
        Set oPeople = New clsPeople
        oPeople.Firstname = arr(i, 1)
        oPeople.Surname = arr(i, 2)
        oPeople.ID = arr(i, 3)
        
        If dict.exists(arr(i, 3)) Then
            dict(arr(i, 3)).Add oPeople
        Else
            Set coll = New Collection
            coll.Add oPeople
            dict.Add arr(i, 3), coll
        End If

    Next i
    
    Set outSht = wb.Worksheets.Add(before:=wb.Worksheets(1))
    outRow = 1
    With outSht.Cells(outRow, 1).Resize(1, rng.Rows(1).Cells.Count)
        .Value = rng.Rows(1).Value
        .Font.Bold = True
    End With
    
    Set outcoll = New Collection
    For Each dKey In dict
        Set coll = dict(dKey)
        ' only output if duplicates exist
        If coll.Count > 1 Then
            For i = 1 To coll.Count
                outcoll.Add coll(i)
            Next i
        End If
    
    Next dKey
    
    ReDim outArr(1 To outcoll.Count, 1 To 3)
    For i = 1 To outcoll.Count
        Set oPeople = outcoll(i)
        outArr(i, 1) = oPeople.Firstname
        outArr(i, 2) = oPeople.Surname
        outArr(i, 3) = oPeople.ID
    Next i
    
    outSht.Range("A2").Resize(UBound(outArr), 3) = outArr
    outSht.Range("A1").CurrentRegion.EntireColumn.AutoFit

End Sub
 
Upvote 0
See if this is any faster.

VBA Code:
Sub GetDuplicatesArrayWIP()
   
    Dim dict As Object
    Dim coll As Collection
    Dim oPeople As clsPeople
    Dim wb As Workbook
    Dim sht As Worksheet
    Dim rng As Range
    Dim i As Long
    Dim dKey As Variant
    Dim outSht As Worksheet
    Dim outRow As Long
    Dim arr As Variant
    Dim outcoll As Collection
    Dim outArr()
   
    Set wb = ThisWorkbook
    Set sht = ActiveSheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set rng = sht.Range("A1").CurrentRegion
    arr = rng
   
    For i = 2 To UBound(arr, 1)
        Set oPeople = New clsPeople
        oPeople.Firstname = arr(i, 1)
        oPeople.Surname = arr(i, 2)
        oPeople.ID = arr(i, 3)
       
        If dict.exists(arr(i, 3)) Then
            dict(arr(i, 3)).Add oPeople
        Else
            Set coll = New Collection
            coll.Add oPeople
            dict.Add arr(i, 3), coll
        End If

    Next i
   
    Set outSht = wb.Worksheets.Add(before:=wb.Worksheets(1))
    outRow = 1
    With outSht.Cells(outRow, 1).Resize(1, rng.Rows(1).Cells.Count)
        .Value = rng.Rows(1).Value
        .Font.Bold = True
    End With
   
    Set outcoll = New Collection
    For Each dKey In dict
        Set coll = dict(dKey)
        ' only output if duplicates exist
        If coll.Count > 1 Then
            For i = 1 To coll.Count
                outcoll.Add coll(i)
            Next i
        End If
   
    Next dKey
   
    ReDim outArr(1 To outcoll.Count, 1 To 3)
    For i = 1 To outcoll.Count
        Set oPeople = outcoll(i)
        outArr(i, 1) = oPeople.Firstname
        outArr(i, 2) = oPeople.Surname
        outArr(i, 3) = oPeople.ID
    Next i
   
    outSht.Range("A2").Resize(UBound(outArr), 3) = outArr
    outSht.Range("A1").CurrentRegion.EntireColumn.AutoFit

End Sub
Okay. Will do. Thanks a Million :)
 
Upvote 0

Forum statistics

Threads
1,225,757
Messages
6,186,850
Members
453,379
Latest member
gabriellegonzalez

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top