Need formula to match

yukonexcel

New Member
Joined
Jul 19, 2024
Messages
6
Office Version
  1. 365
Platform
  1. Windows
Hi,

I'll try to explain this as best I can... I can't share the sheet as it contains sensitive information.

I have a table with over a thousand rows full of text, and I am interested in analyzing it based on the final 3 columns. All cells in those columns contain one word. There are only 4 unique words in the first column, and 4 different unique words in the second column. The third column contains one of 30 different words.

I want to find the most commonly reoccurring word from the third column that matches every combination of the first two columns. Based on the first two columns, there are 16 different combinations. I would like to summarize the large data table in a separate table with 16 rows showing the most common third column word for each combination.

For example: Column A contains values 1 to 4 (text words) and Column B contains values 5 to 8 reoccurring in different combinations hundreds of times. Column C has values 9 to 39 distributed throughout. I would like to know the most commonly reoccurring third column word that matches the combination 1-5, 1-6, 1-7, 1-8, 2-5, 2-6 etc. etc.

Is this possible? What formulas will I need? I hope I didn't butcher the question... Thank you so much!
 

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
Hello yukonexcel

You can achieve this analysis using a combination of Excel functions such as UNIQUE, FILTER, MODE, and COUNTIF. Here’s a step-by-step guide on how to summarize your data to find the most commonly occurring word in the third column for each combination of the first two columns.

Step 1: Prepare Your Data​

Assume your data is in columns A, B, and C, starting from row 2 (A2:C1001).

Step 2: Create a Summary Table​

Create a new sheet for your summary table. In this new sheet, list all the unique combinations of the first two columns.
  1. List Unique Combinations:
    • In column A of the new sheet, list all unique values from column A of your data.
    • In column B, list all unique values from column B of your data.
    • You can use the UNIQUE function to get these lists:
    excel

    =UNIQUE(Sheet1!A2:A1001)
    =UNIQUE(Sheet1!B2:B1001)
  2. Create All Combinations:
    • Combine the unique values to create all possible combinations. You can manually create these combinations or use a formula to automate it.

Step 3: Find the Most Common Word in Column C for Each Combination​

In the summary table, use a formula to find the most common word in column C for each combination of columns A and B.
  1. Helper Column:
    • Use a helper column to filter the data based on the combination of columns A and B.
    • Assume your summary table starts from row 2, with combinations in columns A and B.
    • In cell C2 of the summary sheet, use the following formula to filter the data:
    excel

    =TEXTJOIN(",", TRUE, FILTER(Sheet1!C2:C1001, (Sheet1!A2:A1001=A2) * (Sheet1!B2:B1001=B2)))
    This formula will create a comma-separated list of all values in column C that match the combination in A2 and B2.
  2. Find the Most Common Word:
    • Use the MODE function to find the most common word in the filtered list.
    • In cell D2 of the summary sheet, use the following formula:
    excel

    =MODE(SPLIT(C2, ","))
    This formula will split the comma-separated list and find the most common word.
  3. Copy the Formulas:
    • Copy the formulas in columns C and D down to cover all combinations in your summary table.

Example Summary Table Layout​

Your summary table might look something like this:
Column AColumn BFiltered Data (Helper)Most Common Word
159, 9, 10, 11, 99
1612, 12, 13, 1212
1714, 14, 1514
1816, 17, 16, 1616
2518, 18, 19, 1818
............

Note​

  • The TEXTJOIN function is used to create a list of values that match the criteria.
  • The SPLIT function is used to split the list into individual values.
  • The MODE function is used to find the most common value in the list.
This approach should help you summarize your data and find the most commonly occurring word in the third column for each combination of the first two columns.


hope this helps
plettieri
 
Upvote 0
Hello yukonexcel

You can achieve this analysis using a combination of Excel functions such as UNIQUE, FILTER, MODE, and COUNTIF. Here’s a step-by-step guide on how to summarize your data to find the most commonly occurring word in the third column for each combination of the first two columns.

Step 1: Prepare Your Data​

Assume your data is in columns A, B, and C, starting from row 2 (A2:C1001).

Step 2: Create a Summary Table​

Create a new sheet for your summary table. In this new sheet, list all the unique combinations of the first two columns.
  1. List Unique Combinations:
    • In column A of the new sheet, list all unique values from column A of your data.
    • In column B, list all unique values from column B of your data.
    • You can use the UNIQUE function to get these lists:
    excel

    =UNIQUE(Sheet1!A2:A1001)
    =UNIQUE(Sheet1!B2:B1001)
  2. Create All Combinations:
    • Combine the unique values to create all possible combinations. You can manually create these combinations or use a formula to automate it.

Step 3: Find the Most Common Word in Column C for Each Combination​

In the summary table, use a formula to find the most common word in column C for each combination of columns A and B.
  1. Helper Column:
    • Use a helper column to filter the data based on the combination of columns A and B.
    • Assume your summary table starts from row 2, with combinations in columns A and B.
    • In cell C2 of the summary sheet, use the following formula to filter the data:
    excel

    =TEXTJOIN(",", TRUE, FILTER(Sheet1!C2:C1001, (Sheet1!A2:A1001=A2) * (Sheet1!B2:B1001=B2)))
    This formula will create a comma-separated list of all values in column C that match the combination in A2 and B2.
  2. Find the Most Common Word:
    • Use the MODE function to find the most common word in the filtered list.
    • In cell D2 of the summary sheet, use the following formula:
    excel

    =MODE(SPLIT(C2, ","))
    This formula will split the comma-separated list and find the most common word.
  3. Copy the Formulas:
    • Copy the formulas in columns C and D down to cover all combinations in your summary table.

Example Summary Table Layout​

Your summary table might look something like this:
Column AColumn BFiltered Data (Helper)Most Common Word
159, 9, 10, 11, 99
1612, 12, 13, 1212
1714, 14, 1514
1816, 17, 16, 1616
2518, 18, 19, 1818
............

Note​

  • The TEXTJOIN function is used to create a list of values that match the criteria.
  • The SPLIT function is used to split the list into individual values.
  • The MODE function is used to find the most common value in the list.
This approach should help you summarize your data and find the most commonly occurring word in the third column for each combination of the first two columns.


hope this helps
plettieri
Thank you plettieri, that was very thorough and very quick!

I am getting a #VALUE! error on the helper column however...

My formula looks like: =TEXTJOIN(",",TRUE,FILTER(DATA!F:F,DATA!D:D=CODING!J19)*(DATA!E:E=CODING!K19))

I'm wondering if it is because the data is in a table? Do I necessarily need to use the table format language in this formula?
 
Upvote 0
Maybe:

Book4
ABCDEFGH
1AWWord27Col1Col2Most Common
2BWWord20AWWord6
3DYWord11AXWord16
4DYWord8AYWord3
5CYWord19AZWord1
6BYWord12BWWord12
7DXWord25BXWord3
8CWWord25BYWord27
9DZWord25BZWord16
10DYWord24CWWord23
11DWWord25CXWord28
12BXWord9CYWord1
13DYWord12CZWord8
14AZWord1DWWord3
15CYWord20DXWord26
16BZWord16DYWord20
17AXWord18DZWord6
18AZWord26
19CXWord9
20AYWord12
Sheet4
Cell Formulas
RangeFormula
A1:A1000A1=INDEX({"A","B","C","D"},RANDARRAY(1000,,1,4,1))
B1:B1000B1=INDEX({"W","X","Y","Z"},RANDARRAY(1000,,1,4,1))
C1:C1000C1=INDEX("Word"&SEQUENCE(30),RANDARRAY(1000,,1,30,1))
H2:H17H2=INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),MATCH($C$1:$C$1000,$C$1:$C$1000,0))))
Dynamic array formulas.
 
Upvote 0
Maybe:

Book4
ABCDEFGH
1AWWord27Col1Col2Most Common
2BWWord20AWWord6
3DYWord11AXWord16
4DYWord8AYWord3
5CYWord19AZWord1
6BYWord12BWWord12
7DXWord25BXWord3
8CWWord25BYWord27
9DZWord25BZWord16
10DYWord24CWWord23
11DWWord25CXWord28
12BXWord9CYWord1
13DYWord12CZWord8
14AZWord1DWWord3
15CYWord20DXWord26
16BZWord16DYWord20
17AXWord18DZWord6
18AZWord26
19CXWord9
20AYWord12
Sheet4
Cell Formulas
RangeFormula
A1:A1000A1=INDEX({"A","B","C","D"},RANDARRAY(1000,,1,4,1))
B1:B1000B1=INDEX({"W","X","Y","Z"},RANDARRAY(1000,,1,4,1))
C1:C1000C1=INDEX("Word"&SEQUENCE(30),RANDARRAY(1000,,1,30,1))
H2:H17H2=INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),MATCH($C$1:$C$1000,$C$1:$C$1000,0))))
Dynamic array formulas.

Maybe:

Book4
ABCDEFGH
1AWWord27Col1Col2Most Common
2BWWord20AWWord6
3DYWord11AXWord16
4DYWord8AYWord3
5CYWord19AZWord1
6BYWord12BWWord12
7DXWord25BXWord3
8CWWord25BYWord27
9DZWord25BZWord16
10DYWord24CWWord23
11DWWord25CXWord28
12BXWord9CYWord1
13DYWord12CZWord8
14AZWord1DWWord3
15CYWord20DXWord26
16BZWord16DYWord20
17AXWord18DZWord6
18AZWord26
19CXWord9
20AYWord12
Sheet4
Cell Formulas
RangeFormula
A1:A1000A1=INDEX({"A","B","C","D"},RANDARRAY(1000,,1,4,1))
B1:B1000B1=INDEX({"W","X","Y","Z"},RANDARRAY(1000,,1,4,1))
C1:C1000C1=INDEX("Word"&SEQUENCE(30),RANDARRAY(1000,,1,30,1))
H2:H17H2=INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),MATCH($C$1:$C$1000,$C$1:$C$1000,0))))
Dynamic array formulas.
That is exactly what I am looking for! I'm getting a #N/A error though... Would blank cells or cells that don't match exactly cause this?
 
Upvote 0
There are a lot of reasons you could get an #N/A error. Are you getting it on all rows or just some? If just some, are you sure that the word combination actually occurs somewhere? In fact, now that I think about it, you'd get an #N/A error if the letter combination only occurs once. You can fix that with this version of the formula:

Excel Formula:
=INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),{1,1}*MATCH($C$1:$C$1000,$C$1:$C$1000,0))))

Other than that, I'm just guessing. Is the data in an Excel table? It should still work, but the table nomenclature might require some tweaking. Other than that, can you provide some sample data that causes the error? Dummy up the words and provide a small sample, 10-20 rows.
 
Upvote 0
There are a lot of reasons you could get an #N/A error. Are you getting it on all rows or just some? If just some, are you sure that the word combination actually occurs somewhere? In fact, now that I think about it, you'd get an #N/A error if the letter combination only occurs once. You can fix that with this version of the formula:

Excel Formula:
=INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),{1,1}*MATCH($C$1:$C$1000,$C$1:$C$1000,0))))

Other than that, I'm just guessing. Is the data in an Excel table? It should still work, but the table nomenclature might require some tweaking. Other than that, can you provide some sample data that causes the error? Dummy up the words and provide a small sample, 10-20 rows.
That worked! It still gave the error code if there was a blank though. Is there any workaround for that?
 
Upvote 0
How about

Excel Formula:
=IFERROR(INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),{1,1}*MATCH($C$1:$C$1000,$C$1:$C$1000,0)))),"No match found.")
 
Upvote 0
How about

Excel Formula:
=IFERROR(INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2),{1,1}*MATCH($C$1:$C$1000,$C$1:$C$1000,0)))),"No match found.")
First, thanks for your help Eric W, I really appreciate it.

Your formula works great, I think it's my data that's causing me issues. Most of the combinations output a most common third column word. My issue is that for some combinations, the most common third column value is blank.

Last question: is it possible to tweak the formula to only pull from the rows that have a value in the third column? Say that there are 50 instances of one combination, and 30 of the third column cells are blank. Can I run the same analysis and pull the most common word from the 20 rows that have a value in the third column?

You're a life saver!
 
Upvote 0
Sure:

Excel Formula:
=IFERROR(INDEX($C$1:$C$1000,MODE(IF(($A$1:$A$1000=F2)*($B$1:$B$1000=G2)*($C$1:$C$1000<>""),{1,1}*MATCH($C$1:$C$1000,$C$1:$C$1000,0)))),"No word found")
 
Upvote 0
Solution

Forum statistics

Threads
1,224,735
Messages
6,180,640
Members
452,992
Latest member
TokugawaIesuma

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top