Text Analysis

AndrewD04

New Member
Joined
Aug 24, 2017
Messages
40
Office Version
  1. 365
Hi, Does anyone know how to solve this.
I have a lot of cells with free text in and I want excel to list every unique word and how often that word appears
 

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
Hi Andrew,

One way would be to use a Dictionary object to count each unique word. The trickier part of the problem is define what you want to consider a unique "word" for the purpose of this application.

The simplest definition to code would be to split the text strings up using just the space character as the delimiter. Beyond that you would need to consider the handling of punctuation, special characters, numbers and case sensitivity. Your decision on how you want those handled could make the coding needed to parse "words" significantly more complex.
 
Last edited:
Upvote 0
... The trickier part of the problem is define what you want to consider a unique "word" for the purpose of this application.

The simplest definition to code would be to split the text strings up using just the space character as the delimiter. Beyond that you would need to consider the handling of punctuation, special characters, numbers and case sensitivity. Your decision on how you want those handled could make the coding needed to parse "words" significantly more complex.

Regular expressions might make this a bit easier: use a capturing group, perhaps "(\w+)". That removes punctuation, but "boy's" becomes two words: "boy" and the single letter "s".

Some examples of the tricky parts that Jerry Sullivan mentioned, depending on the body of text and your purpose:

Proper nouns: "Don", the river or the man's name versus "don", meaning to put on. "Mike", the name, and "mike" for a microphone. Is "Los Angeles" one word or two?

Plurals: If you decide to treat the singular and plural forms of nouns as the same word, making "doll" and "dolls" count as repeats of one word, how do you handle "man" and "men", "child" and "children", "index" and "indices", etc? The rules for plurals are strange. In "cherry" the final "y" becomes "ie" in "cherries", but "key" simply adds an "s" after the "y". One "knife" becomes two "knives" and one "dwarf" becomes two "dwarfs"—unless you're Tolkien. The plurals of "ax" and "axis" are both "axes".
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,183
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top