Analyzing list of text fields

Marty446

New Member
Joined
Dec 6, 2018
Messages
1
Good morning/afternoon/evening,


I work for an Insurance company and i'm running into the following problem:


I'm trying to standardise 'free-text clauses' in our insurance system, clauses for insurance products which, for one reason or another, were not entered using a standard clause.


The difficulty here is that people have been using (slightly) different text to describe similar problems, or (slightly) similar text to descrbe different problems for over 20 years now.


i want to analyse my list of cells containing text (some 17000 rows).

I aim to find out:

- how to group the texts by similarity, and how to adjust the definition of 'similarity', E.g. 60% similar, 70% similar etc.

- how to ascertain percentages of similar text, e.g. 120 groups of similar text, group one comprising 4% of the data, group 2 comprising 6% etc.

- how to group the data based on key words, e.g. group all text clauses which contain words 'X', 'Y', 'Z'.

- how to remove certain words from the formulas used for above so as not to include certain phrases or words when calculating similiraty like 'and', 'the client has indicated' etc.

Due to the sensitive nature of the data i cannot post any examples of the data i am working with.


Any and all tips will be greatly appreciated, thanks in advance.
 

Excel Facts

Can you AutoAverage in Excel?
There is a drop-down next to the AutoSum symbol. Open the drop-down to choose AVERAGE, COUNT, MAX, or MIN
i want to analyse my list of cells containing text (some 17000 rows).

I aim to find out:

- how to group the texts by similarity, and how to adjust the definition of 'similarity', E.g. 60% similar, 70% similar etc.

- how to ascertain percentages of similar text, e.g. 120 groups of similar text, group one comprising 4% of the data, group 2 comprising 6% etc.

- how to group the data based on key words, e.g. group all text clauses which contain words 'X', 'Y', 'Z'.

- how to remove certain words from the formulas used for above so as not to include certain phrases or words when calculating similiraty like 'and', 'the client has indicated' etc.

There's an add in called 'Fuzzy Lookup Add-In for Excel' from Microsoft.
I've never used it, but maybe it could help.

https://www.microsoft.com/en-us/download/details.aspx?id=15011
 
Upvote 0

Forum statistics

Threads
1,223,214
Messages
6,170,771
Members
452,353
Latest member
strainu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top