3 lists into 1, sorted and then split into 2 lists again

MixedUpExcel

Board Regular
Joined
Apr 7, 2015
Messages
222
Office Version
  1. 365
Platform
  1. Windows
Hi,

Please note the following request will be to action this only in Excel - I'm on a works pc and don't have other software available to do this.

I have 3 lists of product codes.

The codes are in a random order and can be duplicated across the the 3 columns.

Each product code has an associated code which needs to remain in the column next to the product code it was originally next to.

Example below:

[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]Product Code[/TD]
[TD]Assoc. Code[/TD]
[TD][/TD]
[TD]Product Code[/TD]
[TD]Assoc. Code[/TD]
[TD][/TD]
[TD]Product Code[/TD]
[TD]Associ. Code[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC123[/TD]
[TD]AAA[/TD]
[TD][/TD]
[TD]JHG135[/TD]
[TD]HHH[/TD]
[TD][/TD]
[TD]RKG134[/TD]
[TD]OOO[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]XYS456[/TD]
[TD]BBB[/TD]
[TD][/TD]
[TD]ABC123[/TD]
[TD]III[/TD]
[TD][/TD]
[TD]EKT526[/TD]
[TD]PPP[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC123[/TD]
[TD]CCC[/TD]
[TD][/TD]
[TD]JFH783[/TD]
[TD]JJJ[/TD]
[TD][/TD]
[TD]QWE987[/TD]
[TD]QQQ[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]MST564[/TD]
[TD]DDD[/TD]
[TD][/TD]
[TD]MST563[/TD]
[TD]KKK[/TD]
[TD][/TD]
[TD]ADD654[/TD]
[TD]RRR[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]LRT753[/TD]
[TD]EEE[/TD]
[TD][/TD]
[TD]PRT852[/TD]
[TD]LLL[/TD]
[TD][/TD]
[TD]JHE483[/TD]
[TD]SSS[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC123[/TD]
[TD]FFF[/TD]
[TD][/TD]
[TD]LRT753[/TD]
[TD]MMM[/TD]
[TD][/TD]
[TD]ABC123[/TD]
[TD]TTT[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]MST564[/TD]
[TD]GGG[/TD]
[TD][/TD]
[TD]ABC963[/TD]
[TD]NNN[/TD]
[TD][/TD]
[TD]PRT852[/TD]
[TD]VVV[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

I basically need to get column 4 & 5 under columns 1 & 2 and column 7 & 8 under columns 1 & 2 (making it a single column with an adjacent column with the Associated Products.

I then need to sort the product code (column 1) in alphabetical order with column 2 staying next to the code it's originally next to, like it would in a manual sort.

The end result would be:

[TABLE="width: 128"]
<colgroup><col width="64" span="2" style="width:48pt"> </colgroup><tbody>[TR]
[TD="class: xl63, width: 64"]ABC123[/TD]
[TD="class: xl63, width: 64"]AAA[/TD]
[/TR]
[TR]
[TD="class: xl63"]ABC123[/TD]
[TD="class: xl63"]CCC[/TD]
[/TR]
[TR]
[TD="class: xl63"]ABC123[/TD]
[TD="class: xl63"]FFF[/TD]
[/TR]
[TR]
[TD="class: xl63"]ABC123[/TD]
[TD="class: xl63"]III[/TD]
[/TR]
[TR]
[TD="class: xl63"]ABC123[/TD]
[TD="class: xl63"]TTT[/TD]
[/TR]
[TR]
[TD="class: xl63"]ABC963[/TD]
[TD="class: xl63"]NNN[/TD]
[/TR]
[TR]
[TD="class: xl63"]ADD654[/TD]
[TD="class: xl63"]RRR[/TD]
[/TR]
[TR]
[TD="class: xl63"]EKT526[/TD]
[TD="class: xl63"]PPP[/TD]
[/TR]
[TR]
[TD="class: xl63"]JFH783[/TD]
[TD="class: xl63"]JJJ[/TD]
[/TR]
[TR]
[TD="class: xl63"]JHE483[/TD]
[TD="class: xl63"]SSS[/TD]
[/TR]
[TR]
[TD="class: xl63"]JHG135[/TD]
[TD="class: xl63"]HHH[/TD]
[/TR]
[TR]
[TD="class: xl63"]LRT753[/TD]
[TD="class: xl63"]EEE[/TD]
[/TR]
[TR]
[TD="class: xl63"]LRT753[/TD]
[TD="class: xl63"]MMM[/TD]
[/TR]
[TR]
[TD="class: xl63"]MST563[/TD]
[TD="class: xl63"]KKK[/TD]
[/TR]
[TR]
[TD="class: xl63"]MST564[/TD]
[TD="class: xl63"]DDD[/TD]
[/TR]
[TR]
[TD="class: xl63"]MST564[/TD]
[TD="class: xl63"]GGG[/TD]
[/TR]
[TR]
[TD="class: xl63"]PRT852[/TD]
[TD="class: xl63"]LLL[/TD]
[/TR]
[TR]
[TD="class: xl63"]PRT852[/TD]
[TD="class: xl63"]VVV[/TD]
[/TR]
[TR]
[TD="class: xl63"]QWE987[/TD]
[TD="class: xl63"]QQQ[/TD]
[/TR]
[TR]
[TD="class: xl63"]RKG134[/TD]
[TD="class: xl63"]OOO[/TD]
[/TR]
[TR]
[TD="class: xl63"]XYS456[/TD]
[TD="class: xl63"]BBB[/TD]
[/TR]
</tbody>[/TABLE]

All this is straight forward to do with vba coding.

The problem I have is Column 1 = 500,000 products; Column 4 = 500,000 products; Column 7 = 400,000 products. Total is more than 1,048,576 rows.

Once it is all sorted, it will then need to be split over 2 (4 including associated codes) columns (due to exceeding the Excel Row limit)

Suggestions please?
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
Have you checked for duplicates? Make strings of Col A & ", " & Col B, then remove any duplicates and then split by "," repeat for other columns, then review your reduced data sets.

Or with coding, find a way to read the 3 pairs of columns into a single 2 column array (suspect some redim requirements - only the 2nd dimension can be changed so you may also need to use Application.Transpose for input/output).

There is plenty of online code showing how to sort an array (usually a bubble sort)

Then print the contents out with some kind of "IF x >= Rows.Count - 1000 then move to next column" as each loop iteration finishes.

I'm guessing you'll need to research how to "stitch" arrays together (try searching for write non continguous ranges to same array or something like that) and sorting an array.

You can probably use Application.Index with the array to determine it's mid point and then write out across 4 columns.
 
Upvote 0
When I've run into the need for more than 1+ million rows (whoever thought that would happen?) I head to either ACCESS or an array processing language like APL then, as needed, move the results back to Excel.
My personal preference is APL because it's so each to handle arrays (i.e., VERY large matrices).
 
Upvote 0
Hi JackDanIce and kweaver,

Thank you both for your suggestions.

I'll look into both and see what I can do.

Thanks.
 
Upvote 0
Use PowerQuery, built into Excel, and load the entire data set into the Data Model.
Total rows in the Data Model is only limited by available memory.
 
Upvote 0
I think PowerQuery only because DataModel increase size of the excel file
DataModel can be used after any transformation of data in PQ (if necessary)
 
Upvote 0
With that much data, you really should look into a more robust tool than Excel. However, just for fun I wrote up a macro that should do what you want.

Rich (BB code):
Sub CombineAndSort()
Dim MyCols As Variant, MyResult As Range, r As Variant, sl As Variant
Dim MyData As Variant, i As Long, rw As Long, Output() As Variant
Dim MaxRows As Long
t = Timer
    MyCols = Array(Sheets("Sheet3").Range("A:B"), _
                   Sheets("Sheet3").Range("D:E"), _
                   Sheets("Sheet3").Range("G:H"))
                   
    Set MyResult = Sheets("Sheet4").Range("A:A")
    MaxRows = 500000
    
    On Error GoTo Oops:
    Set sl = CreateObject("System.Collections.SortedList")
    
    On Error Resume Next
    For Each r In MyCols
        lr = r.Resize(1, 1).Offset(Rows.Count - 1).End(xlUp).Row
        MyData = r.Resize(lr, 2).Value
        For i = 2 To UBound(MyData)
            sl.Add MyData(i, 1) & "|" & MyData(i, 2), 1
        Next i
    Next r
    
    rw = 0
    
    ReDim Output(0 To MaxRows, 1 To 2)
    For i = 0 To sl.Count - 1
        rw = rw + 1
        r = Split(sl.getkey(i), "|")
        Output(rw, 1) = r(0)
        Output(rw, 2) = r(1)
        If rw = MaxRows Then
            Output(0, 1) = "Product Code"
            Output(0, 2) = "Associ. Code"
            MyResult.Resize(MaxRows + 1, 2).Value = Output
            ReDim Output(0 To MaxRows, 1 To 2)
            rw = 0
            Set MyResult = MyResult.Offset(, 3)
        End If
    Next i
    
    If rw > 0 Then
        Output(0, 1) = "Product Code"
        Output(0, 2) = "Associ. Code"
        MyResult.Resize(MaxRows + 1, 2).Value = Output
    End If
    Set sl = Nothing
    Debug.Print Timer - t
    Exit Sub
Oops:
    Debug.Print Err.Description
End Sub
Change the parts in red to match your sheet. I ran this on some sample data I created that has 500,000 and 500,000 and 400,000 rows, like yours. It took nearly an hour. Your mileage may vary.
 
Upvote 0
Per Eric W: I, too, created some sample data with 1,500,000 entries but I did this in APL. I ran your sort and split the data and it was about 2 seconds, if that.
 
Upvote 0
kweaver: That just goes to show that using the right tool can have enormous importance, which is what I stated. However, in thinking about it, even in VBA, I thought that using a SortedArray might not be the right tool. It has to sort the elements every time you insert something. I thought using an ArrayList might be faster, since you'd only need to sort it once. And in fact, that turned out to be the case. This version of the macro ran in about 24 seconds. Not quite up to your APL speed, but far better than an hour.

Code:
Sub CombineAndSort()
Dim MyCols As Variant, MyResult As Range, r As Variant, al As Variant
Dim MyData As Variant, i As Long, rw As Long, Output() As Variant
Dim MaxRows As Long, ta As Variant
t = Timer
    MyCols = Array(Sheets("Sheet3").Range("A:B"), _
                   Sheets("Sheet3").Range("D:E"), _
                   Sheets("Sheet3").Range("G:H"))
                   
    Set MyResult = Sheets("Sheet4").Range("A:A")
    MaxRows = 500000
    
    On Error GoTo Oops:
    Set al = CreateObject("System.Collections.ArrayList")
    
    On Error Resume Next
    For Each r In MyCols
        lr = r.Resize(1, 1).Offset(Rows.Count - 1).End(xlUp).Row
        MyData = r.Resize(lr, 2).Value
        For i = 2 To UBound(MyData)
            al.Add MyData(i, 1) & "|" & MyData(i, 2)
        Next i
    Next r
    
    rw = 0
    
    ReDim Output(0 To MaxRows, 1 To 2)
    al.Sort
    ta = al.toarray
    For i = 0 To al.Count - 1
        rw = rw + 1
        r = Split(ta(i), "|")
        Output(rw, 1) = r(0)
        Output(rw, 2) = r(1)
        If rw = MaxRows Then
            Output(0, 1) = "Product Code"
            Output(0, 2) = "Associ. Code"
            MyResult.Resize(MaxRows + 1, 2).Value = Output
            ReDim Output(0 To MaxRows, 1 To 2)
            rw = 0
            Set MyResult = MyResult.Offset(, 3)
        End If
    Next i
    
    If rw > 0 Then
        Output(0, 1) = "Product Code"
        Output(0, 2) = "Associ. Code"
        MyResult.Resize(MaxRows + 1, 2).Value = Output
    End If
    Set al = Nothing
    Debug.Print Timer - t
    Exit Sub
Oops:
    Debug.Print Err.Description
End Sub
 
Upvote 0
Hi All,

Sorry for the late response - back at work today to check them out.

Thank you for your suggestions.

As it looks like some of the suggestions would be add-on's.. I will have to look at them as a last resort as I would need to get the administrator to install those for me.

Eric,

Thank you for the 2 pieces of code.

I ran the first one and it did appear to be taking a while, so I stopped that and ran your second piece. :)

Thank you for that - the second piece of code was so quick.

I had created a less tidy piece of code which was basically just splitting columns, adding to new columns, sorting ... did that a few times and it got me where I wanted but yours is much better.

Thank you again to all, really appreciate the suggestions and solutions.

Simon
 
Upvote 0

Forum statistics

Threads
1,223,908
Messages
6,175,306
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top