Comparing intersections of two lists of concatenated input columns

jahsquare · May 11, 2018

This is something that user Fazza was generously helping with:

See below link for sample input/output data.

Basically I have a table including the following columns: Names, Places, and Year.
Values in the Names & Places columns may be concatenated, e.g. with " " like "Tim Rob Frank" or "US CA MX".

What I need is an X-Y output table showing the de-concatenated intersection of Names and Year, or Names and Places, i.e. how many times are Tim and MX in the same row. In practice there are additional columns and I want to be able to select column headers for comparison from data validation cells (see input file).

Fazza helped with the code below, which accomplishes generation of an output table for Names and Year. What I'm hoping to do is modify this to work with arbitrary inputs. I.e the script would check for column headers present in two data validation cells, and generate the output table accordingly.

The catch is that if the user selects Year, the output table columns should be generated as below. But if they select any other column (i.e. some text-based column that may be concatenated with the same delimiter, say " ") the columns should be split the same as the rows. Note there may be blank cells in any column.

As a secondary problem, Names are part of a hierarchy (see flattened hierarchy on sheet 2 of input file). In some cases I will need to sum up the hierarchy so that I can compare e.g. all level "2"s.

Currently I import my data values to a template of the full Names hierarchy, and use the formula below. But it would be extremely useful if I could instead just generate the needed output from the input (with respect to the full hierarchy located in a separate sheet/book), because importing to the template and repeatedly calculating this formula across thousands of records takes a very long time.
={ SUM(OFFSET(E2,,,IFERROR(MATCH(0,N(B2<(B3:B$260742)),),))) } [Where E2 is the current row value, and B2 is the rank.]

Any help is much appreciated (even if it ignores the hierarchy issue).. Thanks!

Code:

Sub test()

    Dim i As Long, j As Long, k As Long
    Dim lngMinYear As Long
    Dim lngMaxYear As Long
    
    Dim a As Variant
    Dim aIn As Variant
    Dim aOut As Variant
    Dim dic As Variant
    Dim wksNew As Excel.Worksheet
    
    '-----------------------------------
    
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    
    With Cells.Find(What:="Names").CurrentRegion
        aIn = .Value2
        lngMinYear = Application.WorksheetFunction.Min(.Offset(, 1).Resize(, 1))
        lngMaxYear = Application.WorksheetFunction.Max(.Offset(, 1).Resize(, 1))
    End With
    
    ReDim aOut(1 To 60000, lngMinYear - 1 To lngMaxYear)
    
    Set dic = CreateObject("Scripting.Dictionary") 'for destination array row numbers stored by name
    
    For i = LBound(aIn, 1) + 1 To UBound(aIn, 1) 'loop through input data rows
        a = Split(aIn(i, 1), " ") 'Array list of names for record i
        For j = LBound(a) To UBound(a) 'Loop though names on record i
            If Not dic.exists(a(j)) Then 'if haven't already got a record number for this name
                k = k + 1 'data for this name will be in row k of output array
                dic.Add a(j), k 'store row number for this name in dictionary object
                aOut(k, lngMinYear - 1) = a(j) 'store name in first column of output array
            End If
            aOut(dic(a(j)), aIn(i, 2)) = aOut(dic(a(j)), aIn(i, 2)) + 1 'accumulate count
        Next j
    Next i
    Erase a
    Set dic = Nothing
    Set aIn = Nothing
    
    'Create output
    Set wksNew = Application.Workbooks.Add(Template:=xlWBATWorksheet).Worksheets(1)
    With wksNew
        .Range("A2").Resize(k, lngMaxYear - lngMinYear + 2).Value2 = aOut
        .Range("A1").Value2 = "Name"
        For j = lngMinYear To lngMaxYear
            .Cells(1, j - lngMinYear + 2).Value2 = j
        Next j
    End With
    Erase aOut

here is a workbook with some sample input/output data. There are 6 sheets:

1) The input data. Same as before with added column for Places, as well as three data validation fields for horizontal axis input (column header), vertical axis (column header), and fill-hierarchy choice (yes/no).

2) A flattened hierarchy of names, with a sort key

3) Output for input: Date/Names/No

4) Output for input: Date/Names/Yes

5) Output for input: Places/Names/No

6) Output for intput: Places/Names/Yes

jahsquare · May 11, 2018

*Really these could just be two separate vba scripts instead of one that handles both input types (Names + Year) and (Names + Places), i.e. if I can modify the script above to handle (Names + Places), and use them independently.

Comparing intersections of two lists of concatenated input columns

jahsquare

Board Regular

jahsquare

Board Regular

Similar threads

Share this page

Comparing intersections of two lists of concatenated input columns

jahsquare

Board Regular

jahsquare

Board Regular

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock