Identify approximate text matches

thekaoboy

New Member
Joined
Sep 7, 2006
Messages
48
I have a long list of company names and I want to eliminate approximate matches.

For example, my list includes the following:

CENTAURUS ENERGY MASTER FUND L.P.
Centaurus Financial
Centaurus Financial Inc.
Centech Group

In this case, I would want to identify "Centaurus Financial" and "Centaurus Financial Inc." as the same but the others should be considered different. I realize that I will have to review the list manually afterwards.

I have the Fuzzy Lookup add-on from Microsoft but since I am comparing a list to itself, every item on the list produces a 100% match and then any approximate match gets matched up with the other one. For example, "Centaurus Financial" gets fuzzy-matched with "Centaurus Financial Inc." and vice-versa, whereas I want to be able to say both companies are close but not exact.

I have always appreciated this forum's help. Thank you!
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
First of all, I think you have better chance to do this in Word than in Excel. Word is more capable in dealing with text than excel.
You need to find the “pattern” that frequently appear. Such as what words makes a company names looks different, for ex: Inc, Co, Ltd etc. Is it always located in the end or could be in the beginning or in the middle?
So if you can post more sample of your actual data, that would be a good start.
And how many rows of data do you have?
 
Upvote 0
Thanks for your response. Yes, the variation in the name usually comes at the end but there are several of these variations (e.g., LLC vs. L.LC., or INC vs. Inc vs. Inc.) but there could be variations in the middle (e.g., A&B vs. A & B, vs A and B). My list is over 200,000 lines long.
 
Upvote 0
Well, looks like the ‘patern’ will be too complicated (for me). I’m sorry, I don’t think I can help you here. Maybe someone else here can.
That being said, this forum is specialized in Excel. I think you may want to post your problem to other similar forum that has Word section, you can google it. (I can provide you a link but I’m not sure the rules here, is it ok to mention another forum here?).
 
Upvote 0

Forum statistics

Threads
1,223,227
Messages
6,170,849
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top