Create term base from large data set

chillzen

New Member
Joined
Feb 1, 2022
Messages
8
Office Version
  1. 2016
Platform
  1. Windows
  2. MacOS
  3. Mobile
  4. Web
Hello,

I am stuck trying to solve what may be a very simple problem. I understand logically what needs to be done I just don't have the knowledge for the best way to handle this much data and not make excel crash. I am currently limited with no access to utilize Pandas to solve this even though that seems to be the best option. If this can be done in Excel possibly using VBA that would be great. I am creating a 2-column term base for two languages (A and B).

Notes:
Columns A:N = Lang A
Columns O:AP = Lang B
Each row is a set of a word in LangA and its synonyms with LangB and its grammatical case variations
~26K rows total
Total of 67,431 LangA terms | 285,974 Lang B
Not all Language A terms have Language B equivalents and vise versa but still need to be returned (to fill in later with the translations)
Average of 2.5 LangA terms in each row and avg 10.5 LangB
Output needs to be 2 columns (may be split into multiple sheets/files if needed since it will obviously be an extremely large number of rows generated)

Below is the best sample I can provide that should help work towards a solution:

Raw Data
A TermsA TermsB TermsB Terms
A1A2B1B2
A3A4B3
A5B4B5
A6B6
A7
B7B8
B9

Desired Output
Term ATerm B
A1B1
A1B2
A2B1
A2B2
A3B3
A4B3
A5B4
A5B5
A6B6
A7
B7
B8
B9
 
OK, try this version. (And thank you for using XL2BB!)

VBA Code:
Sub Flatten()
Dim SrcSheet As Worksheet, TrgSheet As Worksheet, OutData() As String, MaxRows As Long
Dim MyData As Variant, OutCol As Long, ctr As Long, r As Long, c1 As Long, c2 As Long, c As Long

    Set SrcSheet = Sheets("Sheet16")
    Set TrgSheet = Sheets("Sheet17")
        
    MaxRows = Rows.Count
    c1 = 0
    For c = 1 To 42
        c2 = SrcSheet.Cells(Rows.Count, c).End(xlUp).Row
        If c2 > c1 Then c1 = c2
    Next c
    MyData = SrcSheet.Range("A1:AQ" & c1).Value
    
    OutCol = 1
    ReDim OutData(1 To MaxRows, 1 To 2)
    OutData(1, 1) = "Term A"
    OutData(1, 2) = "Term B"
    ctr = 2
    
    For r = 2 To UBound(MyData)
    
        c1 = 0
        Do
            c1 = c1 + 1
        Loop Until MyData(r, c1) <> "" Or c1 = 14
        
        Do
            c2 = 14
            Do
                c2 = c2 + 1
            Loop Until MyData(r, c2) <> "" Or c2 = 42
            Do
                OutData(ctr, 1) = MyData(r, c1)
                OutData(ctr, 2) = MyData(r, c2)
                ctr = ctr + 1
                If ctr = MaxRows + 1 Then
                    TrgSheet.Cells(1, OutCol).Resize(MaxRows, 2).Value = OutData
                    OutCol = OutCol + 3
                    ReDim OutData(1 To MaxRows, 1 To 2)
                    OutData(1, 1) = "Term A"
                    OutData(1, 2) = "Term B"
                    ctr = 2
                End If
                Do
                    c2 = c2 + 1
                Loop Until MyData(r, c2) <> "" Or c2 > 42
            Loop Until c2 > 42
            Do
                c1 = c1 + 1
            Loop Until MyData(r, c1) <> "" Or c1 > 14
        Loop Until c1 > 14
        
    Next r
    If ctr > 2 Then TrgSheet.Cells(1, OutCol).Resize(ctr - 1, 2).Value = OutData
    
End Sub
 
Upvote 0
Solution

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
OK, try this version. (And thank you for using XL2BB!)

VBA Code:
Sub Flatten()
Dim SrcSheet As Worksheet, TrgSheet As Worksheet, OutData() As String, MaxRows As Long
Dim MyData As Variant, OutCol As Long, ctr As Long, r As Long, c1 As Long, c2 As Long, c As Long

    Set SrcSheet = Sheets("Sheet16")
    Set TrgSheet = Sheets("Sheet17")
       
    MaxRows = Rows.Count
    c1 = 0
    For c = 1 To 42
        c2 = SrcSheet.Cells(Rows.Count, c).End(xlUp).Row
        If c2 > c1 Then c1 = c2
    Next c
    MyData = SrcSheet.Range("A1:AQ" & c1).Value
   
    OutCol = 1
    ReDim OutData(1 To MaxRows, 1 To 2)
    OutData(1, 1) = "Term A"
    OutData(1, 2) = "Term B"
    ctr = 2
   
    For r = 2 To UBound(MyData)
   
        c1 = 0
        Do
            c1 = c1 + 1
        Loop Until MyData(r, c1) <> "" Or c1 = 14
       
        Do
            c2 = 14
            Do
                c2 = c2 + 1
            Loop Until MyData(r, c2) <> "" Or c2 = 42
            Do
                OutData(ctr, 1) = MyData(r, c1)
                OutData(ctr, 2) = MyData(r, c2)
                ctr = ctr + 1
                If ctr = MaxRows + 1 Then
                    TrgSheet.Cells(1, OutCol).Resize(MaxRows, 2).Value = OutData
                    OutCol = OutCol + 3
                    ReDim OutData(1 To MaxRows, 1 To 2)
                    OutData(1, 1) = "Term A"
                    OutData(1, 2) = "Term B"
                    ctr = 2
                End If
                Do
                    c2 = c2 + 1
                Loop Until MyData(r, c2) <> "" Or c2 > 42
            Loop Until c2 > 42
            Do
                c1 = c1 + 1
            Loop Until MyData(r, c1) <> "" Or c1 > 14
        Loop Until c1 > 14
       
    Next r
    If ctr > 2 Then TrgSheet.Cells(1, OutCol).Resize(ctr - 1, 2).Value = OutData
   
End Sub
Eric,

You sir are a genius, returned with 509K pairs and is exactly what I needed!

Much appreciated. Now to analyze your code and learn what each piece means!
 
Upvote 0
Glad I could help! Thanks for the update. ?

Here's the macro with some comments to make it easier to decipher:

VBA Code:
Sub Flatten()
Dim SrcSheet As Worksheet, TrgSheet As Worksheet, OutData() As String, MaxRows As Long
Dim MyData As Variant, OutCol As Long, ctr As Long, r As Long, c1 As Long, c2 As Long, c As Long
    
' Define the input and output sheets
    Set SrcSheet = Sheets("Sheet16")
    Set TrgSheet = Sheets("Sheet17")
        
    MaxRows = Rows.Count            ' How many rows do we want per output block?
    
' This finds the bottom row of our range.  It looks at columns 1 to 42 (A to AP) and finds
' the bottom row of each one.  It keeps the largest value.
    c1 = 0
    For c = 1 To 42
        c2 = SrcSheet.Cells(Rows.Count, c).End(xlUp).Row
        If c2 > c1 Then c1 = c2
    Next c
    MyData = SrcSheet.Range("A1:AQ" & c1).Value     ' Now read the entire range (plus 1 column) for faster processing
    
' Set up some parameters
    OutCol = 1                                      ' This is the first column (A) that we'll write to
    ReDim OutData(1 To MaxRows, 1 To 2)             ' Define the output array
    OutData(1, 1) = "Term A"                        ' Put in the headers
    OutData(1, 2) = "Term B"
    ctr = 2                                         ' This is the current output row
    
    For r = 2 To UBound(MyData)                     ' Check each row of the input array
    
        c1 = 0                                      ' Set column 1 to 0 (English words)
        Do                                          ' Increment column 1 until we find a non-blank cell, or
            c1 = c1 + 1                             ' until we hit the last cell in the range (column N).
        Loop Until MyData(r, c1) <> "" Or c1 = 14
        
        Do
            c2 = 14                                 ' Set column 2 to column N
            Do
                c2 = c2 + 1                         ' Increment column 2 until we find a non-blank cell, or
            Loop Until MyData(r, c2) <> "" Or c2 = 42   ' until we hit the last cell in the range (column AP).
            
' Perform this loop until we run out of words on the current row (r)
            Do
                OutData(ctr, 1) = MyData(r, c1)     ' Save the English word
                OutData(ctr, 2) = MyData(r, c2)     ' Save the Russion word
                ctr = ctr + 1                       ' increment the output row
                
                If ctr = MaxRows + 1 Then           ' Have we filled up the output array?
                    TrgSheet.Cells(1, OutCol).Resize(MaxRows, 2).Value = OutData    ' If so, print it out
                    OutCol = OutCol + 3                     ' Set the next output column
                    ctr = 2                                 ' and reset the output row
                End If
                
                Do                                          ' We now increment column 2 until we find a
                    c2 = c2 + 1                             ' non-blank cell, or run out of columns
                Loop Until MyData(r, c2) <> "" Or c2 > 42
            Loop Until c2 > 42
            
            Do                                              ' We now increment column 1 until we find a
                c1 = c1 + 1                                 ' non-blank cell, or run out of columns
            Loop Until MyData(r, c1) <> "" Or c1 > 14
        Loop Until c1 > 14
        
    Next r
    
' Are there any rows in the output array we haven't printed yet?
    If ctr > 2 Then TrgSheet.Cells(1, OutCol).Resize(ctr - 1, 2).Value = OutData
    
End Sub
 
Upvote 0

Forum statistics

Threads
1,225,738
Messages
6,186,725
Members
453,368
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top