Lambda to count pairs and output array based on criteria

Phantom1

New Member
Joined
Sep 26, 2018
Messages
22
Office Version
  1. 365
  2. 2016
Platform
  1. Windows
I have the following sample data

IDCourse
1000a
1000b
1000c
1001a
1001b
1001d
1002a
1002c
1002d
1003a
1003e
1003f
1004a
1004c
1004f
1005a
1005d
1005f
and need to output with a single formula the following result

abcdefTotal
a62331318
b2211006
c3131019
d3113019
e1000113
f3011139
Total186993954
which is the count of persons for each pair of courses.

I tried the following formula
=LET(
ids, A2:A19,
courses, B2:B19,
uniqueCourses, UNIQUE(courses),
courseRows, INDEX(uniqueCourses, SEQUENCE(ROWS(uniqueCourses), 1)),
courseCols, TRANSPOSE(courseRows),
countMat, MAP(courseRows, courseCols, LAMBDA(cRow,cCol,
SUM( (courses=cRow) * (courses=cCol)))),
countMat)
but got a bunch of #N/A errors.


Any help, suggestions or new formula, will be appreciate.
Thanks in advance
 
Last edited by a moderator:
In that case you might have to go with Power Query or VBA
 
Upvote 0

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Assuming your data starts in A1

VBA Code:
Sub jec()
 Dim dic As Object, cDic As Object, ar, t, k
 Dim j As Long, jj As Long, x As Long, tot As Long
 
 ar = Range("A1").CurrentRegion
 Set dic = CreateObject("scripting.dictionary")
 Set cDic = CreateObject("scripting.dictionary")
 
 For j = 2 To UBound(ar)
    cDic(ar(j, 2)) = ar(j, 2)
    If Not dic.exists(ar(j, 1)) Then dic(ar(j, 1)) = Array(CreateObject("scripting.dictionary"))
    dic(ar(j, 1))(0)(ar(j, 2)) = ar(j, 2)
 Next
 
 cDic("Total") = 1
 t = cDic.keys
 
 ReDim sq(UBound(t) + 1, UBound(t))
 For j = 0 To UBound(t) - 1
    For jj = 0 To UBound(t)
       For Each k In dic.keys
         If dic(k)(0).exists(t(j)) And dic(k)(0).exists(t(jj)) Then x = x + 1
       Next
       tot = tot + x
       sq(j, jj) = x: x = 0
       If jj = UBound(t) Then
          sq(j, jj) = tot
          sq(jj, j) = tot
       End If
    Next
    tot = 0
 Next
          
 With Range("F2")
   .Resize(UBound(t) + 1) = Application.Transpose(t)
   .Offset(-1, 1).Resize(, UBound(t) + 1) = t
   .Offset(, 1).Resize(UBound(sq) + 1, UBound(sq, 2) + 1) = sq
   .Offset(UBound(t), UBound(t) + 1) = Application.Sum(Application.Index(sq, UBound(t) + 1, 0))
 End With
End Sub
 
Upvote 0
Because i have my data in a table that has more than 100 000 rows and 30 columns the functions above take a lot time to calculate. Is there any workthrough?
Also can we add an optional argument in lambda to filter by criteria on values of ids or other columns before counting? This will be very handy as it minimizes the data to process.
Thanks again

Yes, you could use the FILTER function to reduce the amount of data first. What other columns are in your table that might be useful for filtering? Is there an Academic Year column? Or how about a course Start Date and End Date? You might want to provide some additional sample data showing the column(s) that you want to filter by.

As an example, assuming your data is formatted as a structured Excel table (named Table1), you could try something along these lines:

Excel Formula:
=LET(
    data, FILTER(Table1[[ID]:[Course]],(Table1[Start Date]>DATE(2024,8,31))*(Table1[End Date]<DATE(2025,7,1))),
    id, TAKE(data,,1),
    cr, TAKE(data,,-1),
    u, UNIQUE(TOROW(cr,1),1),
    n, COLUMNS(u),
    k, MMULT(--(UNIQUE(TOCOL(id,1))=TOROW(id)),--(u=cr)),
    m, MAKEARRAY(n,n,LAMBDA(r,c,SUM(INDEX(k,,r)*INDEX(k,,c)))),
    t, MMULT(m,EXPAND(1,n,,1)),
    VSTACK(HSTACK("Course",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
)

Note: MMULT was used instead of COUNTIFS from my previous example, because COUNTIFS does not work with arrays.
 
Upvote 0
Yes, you could use the FILTER function to reduce the amount of data first. What other columns are in your table that might be useful for filtering? Is there an Academic Year column? Or how about a course Start Date and End Date? You might want to provide some additional sample data showing the column(s) that you want to filter by.

As an example, assuming your data is formatted as a structured Excel table (named Table1), you could try something along these lines:

Excel Formula:
=LET(
    data, FILTER(Table1[[ID]:[Course]],(Table1[Start Date]>DATE(2024,8,31))*(Table1[End Date]<DATE(2025,7,1))),
    id, TAKE(data,,1),
    cr, TAKE(data,,-1),
    u, UNIQUE(TOROW(cr,1),1),
    n, COLUMNS(u),
    k, MMULT(--(UNIQUE(TOCOL(id,1))=TOROW(id)),--(u=cr)),
    m, MAKEARRAY(n,n,LAMBDA(r,c,SUM(INDEX(k,,r)*INDEX(k,,c)))),
    t, MMULT(m,EXPAND(1,n,,1)),
    VSTACK(HSTACK("Course",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
)

Note: MMULT was used instead of COUNTIFS from my previous example, because COUNTIFS does not work with arrays.

Sorry for not be more specific. My data are in Excel Table similar to this


AreaIDLevelCourseGroup
1010001aMorning
1010001bMorning
1010011aMorning
1010011dAfternoon
1110021aAfternoon
1110022bAfternoon
1110022dAfternoon
1110032aMorning
1110032bAfternoon
1110032dMorning
1110032cAfternoon
1110032eMorning
but with approximately 120000 rows and 40 columns.

Examples of Outputs
With no filters
CourseabdceTotal
a4331112
b3321110
d3231110
c111115
e111115
Total1210105542

With filters (Level=2, Group=Afternoon)
CoursebdcTotal
b2114
d1102
c1012
Total4228

Hope that is clear now.
My main problem with previous answers is speed of calculation and filters but that wasn't asked to be honest.
By the way i could not make the formula mentioned above to work. (Tried to enter cell references for filtering values)
 
Upvote 0
In that case you might have to go with Power Query or VBA
Excellent VBA coding.
Thank you very much. I have tried with VBA and did not succeed the result.
I solved the problem with Power query but the challenge here is to make a one cell formula which will be then added in name manager.
By this, every user can have a macro free workbook and just write the named function, provide array of ids, array of courses and arrays of filtering criteria and get the desired output.
 
Upvote 0
I see your point, but a slow workbook cause of these heavy dynamic array formula’s….
 
Upvote 0
Bottom line is there's just too much data for standard functions to handle efficiently. Moreover, it's not just the size of the dataset that affects efficiency, but also the total number of unique courses. The final output will be a grid consisting of n^2 cells. If you have 100 unique courses, for example, there'll be a total of 10,000 iterations/calculations. With 1,000 unique courses, it'll be 1,000,000.

As a test, I generated a randomized dataset consisting of 120,000 records, with 50 different areas, 6,000 unique ID's, levels from 1 to 10, and 100 unique courses. COUNTIFS on its own did not output the results in a reasonable amount of time (I waited 3 minutes, then gave up and force-closed Excel). However, the new PIVOTBY function successfully output the same results in approx. 6 seconds. After re-writing my original formula using PIVOTBY instead of COUNTIFS (as well as a binary tree method instead of MAKEARRAY), I was able to output the final results for the entire dataset in just under 1 minute (58 seconds). It's not ideal, I know, but when filters are applied, the total calculation time can be significantly reduced to less than 5 or 10 seconds, depending on the amount of data returned by the filter criteria.

If you have access to the new PIVOTBY function and want to give it a try, start by defining the following 4 custom LAMBDA functions in Name Manager:

COUNTPAIRS
Excel Formula:
=LAMBDA(field1,field2,[include],
    LET(
        a, DROP(PIVOTBY(field1,field2,field1,ROWS,1,0,,0,,include),,1),
        u, TAKE(a,1),
        d, DROP(a,1),
        x, IF(d="",0,d),
        y, TRANSPOSE(x),
        m, BiROW(y,LAMBDA(r,BiCOL(x,LAMBDA(c,MMULT(r,c))))),
        t, MMULT(m,IF({1},1,TOCOL(u))),
        VSTACK(HSTACK("Pairs",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
    )
)

INCLλ
Excel Formula:
=LAMBDA(field,criteria,[opt],
    IF(criteria="",IF({1},1,field),IF(opt=1,field>=criteria,IF(opt=2,field<=criteria,field=criteria)))
)

BiROW
Excel Formula:
=LAMBDA(array,function,
    IF(
        ROWS(array)=1,
        function(array),
        VSTACK(
            BiROW(TAKE(array,ROWS(array)/2),function),
            BiROW(DROP(array,ROWS(array)/2),function)
        )
    )
)

BiCOL
Excel Formula:
=LAMBDA(array,function,
    IF(
        COLUMNS(array)=1,
        function(array),
        HSTACK(
            BiCOL(TAKE(array,,COLUMNS(array)/2),function),
            BiCOL(DROP(array,,COLUMNS(array)/2),function)
        )
    )
)

name_manager.png


Once defined, COUNTPAIRS can be combined with INCLλ to set the filter criteria using cells in the worksheet. For example:

Excel Formula:
=COUNTPAIRS(B1:B120001,D1:D120001,INCLλ(A1:A120001,H1,1)*INCLλ(A1:A120001,J1,2)*INCLλ(C1:C120001,H2)*INCLλ(E1:E120001,H3))

pivotby_permutations.png


Please note, you don't have to use the INCLλ function if you don't want to... you could also just filter by any standard criteria method (e.g. ISNUMBER-XMATCH to filter by a list of ID's).

Also note, due to the way I wrote the COUNTPAIRS function (by setting the optional [field_headers] argument of PIVOTBY to 1 - Yes but don't show), all of the referenced fields should include the header row. You can change this to 0 - No, if desired, so that you would only need to reference the data range for each field.

The example shown above took approx. 6 seconds to return the results. This is the best I could come up with for the problem at hand. Perhaps others may know of a better solution. Python might be a possibility, but that's currently outside of my knowledgebase. ;)
 
Upvote 0
Bottom line is there's just too much data for standard functions to handle efficiently. Moreover, it's not just the size of the dataset that affects efficiency, but also the total number of unique courses. The final output will be a grid consisting of n^2 cells. If you have 100 unique courses, for example, there'll be a total of 10,000 iterations/calculations. With 1,000 unique courses, it'll be 1,000,000.

As a test, I generated a randomized dataset consisting of 120,000 records, with 50 different areas, 6,000 unique ID's, levels from 1 to 10, and 100 unique courses. COUNTIFS on its own did not output the results in a reasonable amount of time (I waited 3 minutes, then gave up and force-closed Excel). However, the new PIVOTBY function successfully output the same results in approx. 6 seconds. After re-writing my original formula using PIVOTBY instead of COUNTIFS (as well as a binary tree method instead of MAKEARRAY), I was able to output the final results for the entire dataset in just under 1 minute (58 seconds). It's not ideal, I know, but when filters are applied, the total calculation time can be significantly reduced to less than 5 or 10 seconds, depending on the amount of data returned by the filter criteria.

If you have access to the new PIVOTBY function and want to give it a try, start by defining the following 4 custom LAMBDA functions in Name Manager:

COUNTPAIRS
Excel Formula:
=LAMBDA(field1,field2,[include],
    LET(
        a, DROP(PIVOTBY(field1,field2,field1,ROWS,1,0,,0,,include),,1),
        u, TAKE(a,1),
        d, DROP(a,1),
        x, IF(d="",0,d),
        y, TRANSPOSE(x),
        m, BiROW(y,LAMBDA(r,BiCOL(x,LAMBDA(c,MMULT(r,c))))),
        t, MMULT(m,IF({1},1,TOCOL(u))),
        VSTACK(HSTACK("Pairs",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
    )
)

INCLλ
Excel Formula:
=LAMBDA(field,criteria,[opt],
    IF(criteria="",IF({1},1,field),IF(opt=1,field>=criteria,IF(opt=2,field<=criteria,field=criteria)))
)

BiROW
Excel Formula:
=LAMBDA(array,function,
    IF(
        ROWS(array)=1,
        function(array),
        VSTACK(
            BiROW(TAKE(array,ROWS(array)/2),function),
            BiROW(DROP(array,ROWS(array)/2),function)
        )
    )
)

BiCOL
Excel Formula:
=LAMBDA(array,function,
    IF(
        COLUMNS(array)=1,
        function(array),
        HSTACK(
            BiCOL(TAKE(array,,COLUMNS(array)/2),function),
            BiCOL(DROP(array,,COLUMNS(array)/2),function)
        )
    )
)

View attachment 117928

Once defined, COUNTPAIRS can be combined with INCLλ to set the filter criteria using cells in the worksheet. For example:

Excel Formula:
=COUNTPAIRS(B1:B120001,D1:D120001,INCLλ(A1:A120001,H1,1)*INCLλ(A1:A120001,J1,2)*INCLλ(C1:C120001,H2)*INCLλ(E1:E120001,H3))

View attachment 117929

Please note, you don't have to use the INCLλ function if you don't want to... you could also just filter by any standard criteria method (e.g. ISNUMBER-XMATCH to filter by a list of ID's).

Also note, due to the way I wrote the COUNTPAIRS function (by setting the optional [field_headers] argument of PIVOTBY to 1 - Yes but don't show), all of the referenced fields should include the header row. You can change this to 0 - No, if desired, so that you would only need to reference the data range for each field.

The example shown above took approx. 6 seconds to return the results. This is the best I could come up with for the problem at hand. Perhaps others may know of a better solution. Python might be a possibility, but that's currently outside of my knowledgebase. ;)

Oh my gosh, I can't believe I missed the obvious, lol. There is no need to use an iterative function like MAKEARRAY, REDUCE or MAP, or either of the custom BiROW/BiCOL functions shown in my previous reply. MMULT can handle it in one go, and it's highly efficient!

Assuming the PIVOTBY function is available in your version of Excel, try defining the following 2 custom LAMBDA functions in Name Manager:

COUNTPAIRS
Excel Formula:
=LAMBDA(field1,field2,[include],
    LET(
        a, DROP(PIVOTBY(field1,field2,field1,ROWS,1,0,,0,,include),,1),
        u, TAKE(a,1),
        d, DROP(a,1),
        x, IF(d="",0,d),
        y, TRANSPOSE(x),
        m, MMULT(y,x),
        t, MMULT(m,IF({1},1,TOCOL(u))),
        VSTACK(HSTACK("Pairs",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
    )
)

INCLλ
Excel Formula:
=LAMBDA(field,criteria,[opt],
    IF(
        TYPE(criteria)=64,
        IF(opt<0,ISNA(XMATCH(field,SORT(TOCOL(criteria,3)),,2)),ISNUMBER(XMATCH(field,SORT(TOCOL(criteria,3)),,2))),
        IF(
            criteria="",
            IF({1},1,field),
            IF(opt=1,field>=criteria,IF(opt=2,field<=criteria,IF(opt<0,field<>criteria,field=criteria)))
        )
    )
)

Then, you can use the COUNTPAIRS function as demonstrated in my previous reply:

Excel Formula:
=COUNTPAIRS(B1:B120001,D1:D120001,INCLλ(A1:A120001,H1,1)*INCLλ(A1:A120001,J1,2)*INCLλ(C1:C120001,H2)*INCLλ(E1:E120001,H3))

I also added more functionality to the INCLλ function, so it can now filter by a list of values to include (or exclude if the [opt] argument is set to -1).

When tested with my sample dataset of 120,000 records with 6,000 unique ID's and 100 unique courses, the results were returned for the entire dataset (without filters) in approx. 7 seconds, and when the same filter criteria shown in my previous example was applied, it returned the results in less than a second!

If you have an active MS365 subscription, but don't have the new PIVOTBY function yet, you should be getting it soon. Cheers!
 
Upvote 0
Solution
Oh my gosh, I can't believe I missed the obvious, lol. There is no need to use an iterative function like MAKEARRAY, REDUCE or MAP, or either of the custom BiROW/BiCOL functions shown in my previous reply. MMULT can handle it in one go, and it's highly efficient!

Assuming the PIVOTBY function is available in your version of Excel, try defining the following 2 custom LAMBDA functions in Name Manager:

COUNTPAIRS
Excel Formula:
=LAMBDA(field1,field2,[include],
    LET(
        a, DROP(PIVOTBY(field1,field2,field1,ROWS,1,0,,0,,include),,1),
        u, TAKE(a,1),
        d, DROP(a,1),
        x, IF(d="",0,d),
        y, TRANSPOSE(x),
        m, MMULT(y,x),
        t, MMULT(m,IF({1},1,TOCOL(u))),
        VSTACK(HSTACK("Pairs",u,"Total"),HSTACK(TOCOL(u),m,t),HSTACK("Total",TOROW(t),SUM(t)))
    )
)

INCLλ
Excel Formula:
=LAMBDA(field,criteria,[opt],
    IF(
        TYPE(criteria)=64,
        IF(opt<0,ISNA(XMATCH(field,SORT(TOCOL(criteria,3)),,2)),ISNUMBER(XMATCH(field,SORT(TOCOL(criteria,3)),,2))),
        IF(
            criteria="",
            IF({1},1,field),
            IF(opt=1,field>=criteria,IF(opt=2,field<=criteria,IF(opt<0,field<>criteria,field=criteria)))
        )
    )
)

Then, you can use the COUNTPAIRS function as demonstrated in my previous reply:

Excel Formula:
=COUNTPAIRS(B1:B120001,D1:D120001,INCLλ(A1:A120001,H1,1)*INCLλ(A1:A120001,J1,2)*INCLλ(C1:C120001,H2)*INCLλ(E1:E120001,H3))

I also added more functionality to the INCLλ function, so it can now filter by a list of values to include (or exclude if the [opt] argument is set to -1).

When tested with my sample dataset of 120,000 records with 6,000 unique ID's and 100 unique courses, the results were returned for the entire dataset (without filters) in approx. 7 seconds, and when the same filter criteria shown in my previous example was applied, it returned the results in less than a second!

If you have an active MS365 subscription, but don't have the new PIVOTBY function yet, you should be getting it soon. Cheers!
I am so grateful !!! Thanks again for spending time on this. Even with your previous post, got the desired output in about 8 seconds. I am sure the latest is a light speed solution. Excellent work. Congrats to you and all the rest who reply to this forum and transfer knowledge to novice people like me. Cheers!
 
Upvote 0
Thank you all for your advice and solutions.

I have another challenge for you but since the problem here is solved (more than enough), i think it would be more appropriate to make a new post for the next problem , one step ahead of this.
See new post is named "Lambda or VBA to solve the pairs webinar problem".
Thanks again and hope to see you there!!!
 
Upvote 0

Forum statistics

Threads
1,222,759
Messages
6,168,052
Members
452,160
Latest member
Bekerinik

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top