This is something that user Fazza was generously helping with:
See below link for sample input/output data.
Basically I have a table including the following columns: Names, Places, and Year.
Values in the Names & Places columns may be concatenated, e.g. with " " like "Tim Rob Frank" or "US CA MX".
What I need is an X-Y output table showing the de-concatenated intersection of Names and Year, or Names and Places, i.e. how many times are Tim and MX in the same row. In practice there are additional columns and I want to be able to select column headers for comparison from data validation cells (see input file).
Fazza helped with the code below, which accomplishes generation of an output table for Names and Year. What I'm hoping to do is modify this to work with arbitrary inputs. I.e the script would check for column headers present in two data validation cells, and generate the output table accordingly.
The catch is that if the user selects Year, the output table columns should be generated as below. But if they select any other column (i.e. some text-based column that may be concatenated with the same delimiter, say " ") the columns should be split the same as the rows. Note there may be blank cells in any column.
As a secondary problem, Names are part of a hierarchy (see flattened hierarchy on sheet 2 of input file). In some cases I will need to sum up the hierarchy so that I can compare e.g. all level "2"s.
Currently I import my data values to a template of the full Names hierarchy, and use the formula below. But it would be extremely useful if I could instead just generate the needed output from the input (with respect to the full hierarchy located in a separate sheet/book), because importing to the template and repeatedly calculating this formula across thousands of records takes a very long time.
={ SUM(OFFSET(E2,,,IFERROR(MATCH(0,N(B2<(B3:B$260742)),),))) } [Where E2 is the current row value, and B2 is the rank.]
Any help is much appreciated (even if it ignores the hierarchy issue).. Thanks!
here is a workbook with some sample input/output data. There are 6 sheets:
1) The input data. Same as before with added column for Places, as well as three data validation fields for horizontal axis input (column header), vertical axis (column header), and fill-hierarchy choice (yes/no).
2) A flattened hierarchy of names, with a sort key
3) Output for input: Date/Names/No
4) Output for input: Date/Names/Yes
5) Output for input: Places/Names/No
6) Output for intput: Places/Names/Yes
See below link for sample input/output data.
Basically I have a table including the following columns: Names, Places, and Year.
Values in the Names & Places columns may be concatenated, e.g. with " " like "Tim Rob Frank" or "US CA MX".
What I need is an X-Y output table showing the de-concatenated intersection of Names and Year, or Names and Places, i.e. how many times are Tim and MX in the same row. In practice there are additional columns and I want to be able to select column headers for comparison from data validation cells (see input file).
Fazza helped with the code below, which accomplishes generation of an output table for Names and Year. What I'm hoping to do is modify this to work with arbitrary inputs. I.e the script would check for column headers present in two data validation cells, and generate the output table accordingly.
The catch is that if the user selects Year, the output table columns should be generated as below. But if they select any other column (i.e. some text-based column that may be concatenated with the same delimiter, say " ") the columns should be split the same as the rows. Note there may be blank cells in any column.
As a secondary problem, Names are part of a hierarchy (see flattened hierarchy on sheet 2 of input file). In some cases I will need to sum up the hierarchy so that I can compare e.g. all level "2"s.
Currently I import my data values to a template of the full Names hierarchy, and use the formula below. But it would be extremely useful if I could instead just generate the needed output from the input (with respect to the full hierarchy located in a separate sheet/book), because importing to the template and repeatedly calculating this formula across thousands of records takes a very long time.
={ SUM(OFFSET(E2,,,IFERROR(MATCH(0,N(B2<(B3:B$260742)),),))) } [Where E2 is the current row value, and B2 is the rank.]
Any help is much appreciated (even if it ignores the hierarchy issue).. Thanks!
Code:
Sub test()
Dim i As Long, j As Long, k As Long
Dim lngMinYear As Long
Dim lngMaxYear As Long
Dim a As Variant
Dim aIn As Variant
Dim aOut As Variant
Dim dic As Variant
Dim wksNew As Excel.Worksheet
'-----------------------------------
Application.ScreenUpdating = False
Application.EnableEvents = False
With Cells.Find(What:="Names").CurrentRegion
aIn = .Value2
lngMinYear = Application.WorksheetFunction.Min(.Offset(, 1).Resize(, 1))
lngMaxYear = Application.WorksheetFunction.Max(.Offset(, 1).Resize(, 1))
End With
ReDim aOut(1 To 60000, lngMinYear - 1 To lngMaxYear)
Set dic = CreateObject("Scripting.Dictionary") 'for destination array row numbers stored by name
For i = LBound(aIn, 1) + 1 To UBound(aIn, 1) 'loop through input data rows
a = Split(aIn(i, 1), " ") 'Array list of names for record i
For j = LBound(a) To UBound(a) 'Loop though names on record i
If Not dic.exists(a(j)) Then 'if haven't already got a record number for this name
k = k + 1 'data for this name will be in row k of output array
dic.Add a(j), k 'store row number for this name in dictionary object
aOut(k, lngMinYear - 1) = a(j) 'store name in first column of output array
End If
aOut(dic(a(j)), aIn(i, 2)) = aOut(dic(a(j)), aIn(i, 2)) + 1 'accumulate count
Next j
Next i
Erase a
Set dic = Nothing
Set aIn = Nothing
'Create output
Set wksNew = Application.Workbooks.Add(Template:=xlWBATWorksheet).Worksheets(1)
With wksNew
.Range("A2").Resize(k, lngMaxYear - lngMinYear + 2).Value2 = aOut
.Range("A1").Value2 = "Name"
For j = lngMinYear To lngMaxYear
.Cells(1, j - lngMinYear + 2).Value2 = j
Next j
End With
Erase aOut
here is a workbook with some sample input/output data. There are 6 sheets:
1) The input data. Same as before with added column for Places, as well as three data validation fields for horizontal axis input (column header), vertical axis (column header), and fill-hierarchy choice (yes/no).
2) A flattened hierarchy of names, with a sort key
3) Output for input: Date/Names/No
4) Output for input: Date/Names/Yes
5) Output for input: Places/Names/No
6) Output for intput: Places/Names/Yes
Last edited: