I have this very very big list of almost 700K rows and about 30 columns of data. I have a somewhat working formula to do what I want but it doesn't really work 100% of the time.
The data is of all purchases done during a set period and I have a formula that determines what type of purchase it is based on 5 categories( lets call them difficulties 1, 2, 3, 4 and 5). Now as each user can be on present in a lot of rows I need to find our which is their highest level of difficulty in the list.
The data for each line looks like this (row 1 and 3 the top row with headers 2 and 4 are the data for row 2):
[TABLE="class: grid, width: 2000, align: left"]
<tbody>[TR]
[TD]User ID (G)[/TD]
[TD]X (H)[/TD]
[TD]X(I)[/TD]
[TD]X (J)[/TD]
[TD]X (K)[/TD]
[TD]X (L)[/TD]
[TD]X (M)[/TD]
[TD]X (N)[/TD]
[TD]X (O)[/TD]
[TD]X (P)[/TD]
[TD]Difficulty (Q)[/TD]
[TD]Combination G2+Difficulty 1 (R)[/TD]
[TD]Combination G2+Difficulty 2 (S)[/TD]
[TD]Combination G2+Difficulty 3 (T)[/TD]
[TD]Combination G2+Difficulty 4 (U)[/TD]
[TD]Combination G2+Difficulty 5 (V)[/TD]
[TD]Combination G2+ Q2 (W)[/TD]
[/TR]
[TR]
[TD]Test1[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]Dif4[/TD]
[TD]Test1-Dif1[/TD]
[TD]Test1-Dif2[/TD]
[TD]Test1-Dif3[/TD]
[TD]Test1-Dif4[/TD]
[TD]Test1-Dif5[/TD]
[TD]Test1-Dif4[/TD]
[/TR]
[TR]
[TD]IF(R2=W2;G2;"0") (X)[/TD]
[TD]IF(S2=W2;G2;"0") (Y)[/TD]
[TD]IF(T2=W2;G2;"0") (Z)[/TD]
[TD]IF(U2=W2;G2;"0") (AA)[/TD]
[TD]IF(V2=W2;G2;"0") (AB)[/TD]
[TD]Formula to determine highest difficulty (AC)[/TD]
[TD]X (AD)[/TD]
[TD]X (AE)[/TD]
[TD]Name of Difficulty 1 (AF)[/TD]
[TD]Name of Difficulty 2 (AG)[/TD]
[TD]Name of Difficulty 3 (AH)[/TD]
[TD]Name of Difficulty 4 (AI)[/TD]
[TD]Name of Difficulty 5 (AJ)[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]0[/TD]
[TD]0[/TD]
[TD]0[/TD]
[TD]Test1[/TD]
[TD]0[/TD]
[TD]Dif1[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]Dif1[/TD]
[TD]Dif2[/TD]
[TD]Dif3[/TD]
[TD]Dif4[/TD]
[TD]Dif5[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
[TABLE="width: 1792"]
<tbody>[TR]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64, align: right"][/TD]
[/TR]
</tbody>[/TABLE]
Written an X for data that is irrelevant for this specific formula (but necessary for me in the final analysis) and also started on column G as the ones before are also irrelevant.
The formula I have create looks like this:
=IF(Q2=$AF$2;$AF$2;(IF(ISNA(IF(G2=X2;$AF$2;INDEX(Q$2:Q$624765;MATCH(G2;X$2:X$624765;0))));(IF(ISNA(IF(G2=Y2;$AG$2;INDEX(Q$2:Q$624765;MATCH(G2;Y$2:Y$624765;0))));(IF(ISNA(IF(G2=Z2;$AH$2;INDEX(Q2:Q$624765;MATCH(G2;Z$2:Z$624765;0))));(IF(ISNA(IF(G2=AA2;$AI$2;INDEX(Q2:Q$624765;MATCH(G2;AA$2:AA$624765;0))));IF(Q2=$AJ$2;$AJ$2;INDEX(Q2:Q$624765;MATCH(G2;AB$2:AB$624765;0)));IF(G2=AA2;$AI$2;INDEX(Q2:Q$624765;MATCH(G2;AA$2:AA$624765;0)))));IF(G2=Z2;$AH$2;INDEX(Q2:Q$624765;MATCH(G2;Z$2:Z$624765;0)))));IF(G2=Y2;$AG$2;INDEX(Q$2:Q$624765;MATCH(G2;Y$2:Y$624765;0)))));IF(G2=X2;$AF$2;INDEX(Q$2:Q$624765;MATCH(G2;X$2:X$624765;0))))))
In short it checks if the User ID is visible in the columns X-AB in any of the 700K rows by first checking Dif1 then 2 and so on. The IF(ISNA) does so that when there result will be N/A as the user ID isn't visible in any rows in column X it will go to the next column and if the user ID is in that column it will then fill the cell with the name of that difficulty level.
The help I need then are one of 2 things.
1. First if you can help me simplify this is some way then please help because this formula takes about 1h to calculate for the whole sheet.
2. Second is that it sometimes returns REF error but I cant find why, only for about 1% of the total. But it does look like it does it only when the difficulty for the row is either 4 or 5. But doesn't always return REF in those situations.
The data is of all purchases done during a set period and I have a formula that determines what type of purchase it is based on 5 categories( lets call them difficulties 1, 2, 3, 4 and 5). Now as each user can be on present in a lot of rows I need to find our which is their highest level of difficulty in the list.
The data for each line looks like this (row 1 and 3 the top row with headers 2 and 4 are the data for row 2):
[TABLE="class: grid, width: 2000, align: left"]
<tbody>[TR]
[TD]User ID (G)[/TD]
[TD]X (H)[/TD]
[TD]X(I)[/TD]
[TD]X (J)[/TD]
[TD]X (K)[/TD]
[TD]X (L)[/TD]
[TD]X (M)[/TD]
[TD]X (N)[/TD]
[TD]X (O)[/TD]
[TD]X (P)[/TD]
[TD]Difficulty (Q)[/TD]
[TD]Combination G2+Difficulty 1 (R)[/TD]
[TD]Combination G2+Difficulty 2 (S)[/TD]
[TD]Combination G2+Difficulty 3 (T)[/TD]
[TD]Combination G2+Difficulty 4 (U)[/TD]
[TD]Combination G2+Difficulty 5 (V)[/TD]
[TD]Combination G2+ Q2 (W)[/TD]
[/TR]
[TR]
[TD]Test1[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]Dif4[/TD]
[TD]Test1-Dif1[/TD]
[TD]Test1-Dif2[/TD]
[TD]Test1-Dif3[/TD]
[TD]Test1-Dif4[/TD]
[TD]Test1-Dif5[/TD]
[TD]Test1-Dif4[/TD]
[/TR]
[TR]
[TD]IF(R2=W2;G2;"0") (X)[/TD]
[TD]IF(S2=W2;G2;"0") (Y)[/TD]
[TD]IF(T2=W2;G2;"0") (Z)[/TD]
[TD]IF(U2=W2;G2;"0") (AA)[/TD]
[TD]IF(V2=W2;G2;"0") (AB)[/TD]
[TD]Formula to determine highest difficulty (AC)[/TD]
[TD]X (AD)[/TD]
[TD]X (AE)[/TD]
[TD]Name of Difficulty 1 (AF)[/TD]
[TD]Name of Difficulty 2 (AG)[/TD]
[TD]Name of Difficulty 3 (AH)[/TD]
[TD]Name of Difficulty 4 (AI)[/TD]
[TD]Name of Difficulty 5 (AJ)[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]0[/TD]
[TD]0[/TD]
[TD]0[/TD]
[TD]Test1[/TD]
[TD]0[/TD]
[TD]Dif1[/TD]
[TD]x[/TD]
[TD]x[/TD]
[TD]Dif1[/TD]
[TD]Dif2[/TD]
[TD]Dif3[/TD]
[TD]Dif4[/TD]
[TD]Dif5[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
[TABLE="width: 1792"]
<tbody>[TR]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64"][/TD]
[TD="width: 64, align: right"][/TD]
[TD="width: 64, align: right"][/TD]
[/TR]
</tbody>[/TABLE]
Written an X for data that is irrelevant for this specific formula (but necessary for me in the final analysis) and also started on column G as the ones before are also irrelevant.
The formula I have create looks like this:
=IF(Q2=$AF$2;$AF$2;(IF(ISNA(IF(G2=X2;$AF$2;INDEX(Q$2:Q$624765;MATCH(G2;X$2:X$624765;0))));(IF(ISNA(IF(G2=Y2;$AG$2;INDEX(Q$2:Q$624765;MATCH(G2;Y$2:Y$624765;0))));(IF(ISNA(IF(G2=Z2;$AH$2;INDEX(Q2:Q$624765;MATCH(G2;Z$2:Z$624765;0))));(IF(ISNA(IF(G2=AA2;$AI$2;INDEX(Q2:Q$624765;MATCH(G2;AA$2:AA$624765;0))));IF(Q2=$AJ$2;$AJ$2;INDEX(Q2:Q$624765;MATCH(G2;AB$2:AB$624765;0)));IF(G2=AA2;$AI$2;INDEX(Q2:Q$624765;MATCH(G2;AA$2:AA$624765;0)))));IF(G2=Z2;$AH$2;INDEX(Q2:Q$624765;MATCH(G2;Z$2:Z$624765;0)))));IF(G2=Y2;$AG$2;INDEX(Q$2:Q$624765;MATCH(G2;Y$2:Y$624765;0)))));IF(G2=X2;$AF$2;INDEX(Q$2:Q$624765;MATCH(G2;X$2:X$624765;0))))))
In short it checks if the User ID is visible in the columns X-AB in any of the 700K rows by first checking Dif1 then 2 and so on. The IF(ISNA) does so that when there result will be N/A as the user ID isn't visible in any rows in column X it will go to the next column and if the user ID is in that column it will then fill the cell with the name of that difficulty level.
The help I need then are one of 2 things.
1. First if you can help me simplify this is some way then please help because this formula takes about 1h to calculate for the whole sheet.
2. Second is that it sometimes returns REF error but I cant find why, only for about 1% of the total. But it does look like it does it only when the difficulty for the row is either 4 or 5. But doesn't always return REF in those situations.