Finding the top phrases containing 1, 2, 3 or 4 words in a list and how many occurrences

lunap04

New Member
Joined
Sep 26, 2020
Messages
3
Office Version
  1. 2019
Platform
  1. MacOS
Hi, I have a list of around 40K rows with keywords in a column and I'm working on a way to go through this list and see which keyword cluster would be relevant to have.

I would like to get a list of:

- which same word is repeated across this list and how many times
- which 2 words (close to each other) are repeated and how many occurrences
- which 3 words (close to each other) are repeated and how many occurrences
- which 4 words (close to each other) are repeated and how many occurrences

In a nutshell, it should be something similar to what this tool does Text Analyzer - Text analysis Tool - Counts Frequencies of Words, Characters, Sentences and Syllables but in excel.

Thanks a lot in advance to anyone who will offer a solution.
 

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
Hi, I have a list of around 40K rows with keywords in a column and I'm working on a way to go through this list and see which keyword cluster would be relevant to have.

I would like to get a list of:

- which same word is repeated across this list and how many times
- which 2 words (close to each other) are repeated and how many occurrences
- which 3 words (close to each other) are repeated and how many occurrences
- which 4 words (close to each other) are repeated and how many occurrences

In a nutshell, it should be something similar to what this tool does Text Analyzer - Text analysis Tool - Counts Frequencies of Words, Characters, Sentences and Syllables but in excel.

Thanks a lot in advance to anyone who will offer a solution.
a snap of the data you are working with will be nice to show.
 
Upvote 0
a snap of the data you are working with will be nice to show.

Hi @padapinto, I'm trying to categorise/cluster a large number of words and phrases in a column but I don't know which cluster or category to use for each of these words/phrases.

I was trying to find a way to go through this list and create groups or clusters based on how many times a single word (or ideally 2, 3, and 4 words) appears throughout the list. Ideally, it would be good to see which 1, 2, 3, and 4 words appear the most throughout this list (so to find out which clusters to focus on) and then get a list of words that are in that cluster.

Attached a screenshot of the data I'm working to give you an idea.
 

Attachments

  • Screenshot 2020-09-26 at 19.52.58.png
    Screenshot 2020-09-26 at 19.52.58.png
    78.8 KB · Views: 20
Upvote 0
I could say use Power Query but you are on MacOS
anyway use XL2BB to post your example , not a picture
 
Upvote 0
Hi @padapinto, I'm trying to categorise/cluster a large number of words and phrases in a column but I don't know which cluster or category to use for each of these words/phrases.

I was trying to find a way to go through this list and create groups or clusters based on how many times a single word (or ideally 2, 3, and 4 words) appears throughout the list. Ideally, it would be good to see which 1, 2, 3, and 4 words appear the most throughout this list (so to find out which clusters to focus on) and then get a list of words that are in that cluster.

Attached a screenshot of the data I'm working to give you an idea.
I HAVE USED FORMULAS to get the results of combined 2, and 3 words but first you need to break the phrase using column to text.
you can check extended office website for text to column then after that you apply my formulas. to highlight the most phrases with 2 words, i have used a formula in conditional formats to get that =$J119=$J$111:$J$118. when writing conditional format, you just click the cells that are involved thus my formula is relating to row 119 because the table is in that area. you can see the snapshot
queryBROKEN WORDS1 WORD2 WORDS3 WORDS4 WORDS2 WORDS3 WORDS4 WORDS2 WORDS3 WORDS4 WORDS
zoom background tv setszoombackgroundSTVsetszoomzoom backgroundSzoom backgroundS TVzoom backgroundS TVsets421
zoom background quiz showzoombackgroundSquizshowzoomzoom backgroundSzoom backgroundS quizzoom backgroundS quizshow411
zoom bacgrounds game showzoombackgroundSgameshowzoomzoom backgroundSzoom backgroundS gamezoom backgroundS gameshow411
zoom background funnyzoombackgroundSTVzoomzoom backgroundSzoom backgroundS TVzoom backgroundS TV421
youtube channel virginyoutubechannelvirginyoutubeyoutube channelyoutube channel virginyoutube channel virgin111
youtube on virgin tvyoutubeonvirgintvyoutubeyoutube onyoutube on virginyoutube on virgintv111
your accountyouraccountyouryour accountyour accountyour account111
young onesyoungonesyoungyoung onesyoung onesyoung ones111
TOTALS
421
1 WORD2 WORDS3 WORDS4 WORDS2 WORDS3 WORDS4 WORDS
=B111=B111&" "&C111=G111&" "&" "&D111=CONCATENATE(H111,E111)=COUNTIF($G$111:$G$118,G111)=COUNTIF($H$111:$H$118,H111)=COUNTIF($I$111:$I$118,I111)
=B112=B112&" "&C112=G112&" "&" "&D112=CONCATENATE(H112,E112)=COUNTIF($G$111:$G$118,G112)=COUNTIF($H$111:$H$118,H112)=COUNTIF($I$111:$I$118,I112)
=B113=B113&" "&C113=G113&" "&" "&D113=CONCATENATE(H113,E113)=COUNTIF($G$111:$G$118,G113)=COUNTIF($H$111:$H$118,H113)=COUNTIF($I$111:$I$118,I113)
=B114=B114&" "&C114=G114&" "&" "&D114=CONCATENATE(H114,E114)=COUNTIF($G$111:$G$118,G114)=COUNTIF($H$111:$H$118,H114)=COUNTIF($I$111:$I$118,I114)
=B115=B115&" "&C115=G115&" "&" "&D115=CONCATENATE(H115,E115)=COUNTIF($G$111:$G$118,G115)=COUNTIF($H$111:$H$118,H115)=COUNTIF($I$111:$I$118,I115)
=B116=B116&" "&C116=G116&" "&" "&D116=CONCATENATE(H116,E116)=COUNTIF($G$111:$G$118,G116)=COUNTIF($H$111:$H$118,H116)=COUNTIF($I$111:$I$118,I116)
=B117=B117&" "&C117=G117&" "&" "&D117=CONCATENATE(H117,E117)=COUNTIF($G$111:$G$118,G117)=COUNTIF($H$111:$H$118,H117)=COUNTIF($I$111:$I$118,I117)
=B118=B118&" "&C118=G118&" "&" "&D118=CONCATENATE(H118,E118)=COUNTIF($G$111:$G$118,G118)=COUNTIF($H$111:$H$118,H118)=COUNTIF($I$111:$I$118,I118)
=MAX(J111:J118)=MAX(K111:K118)=MAX(L111:L118)
SEPERATE.png
 
Upvote 0

Forum statistics

Threads
1,223,240
Messages
6,170,951
Members
452,368
Latest member
jayp2104

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top