Hi all,
This is a tough one to explain, but I’ll do my best as I’ve not found anyone who can offer a solution.
I have a list of measures that a number of organizations are monitoring and I've been asked to find out which organizations are measuring similar things. There are thousands of measures, so I’ve used the fuzzy match plug-in to identify similarities in the text. I’ve set the threshold at 0.6.</SPAN>
The problem that I’ve now got is that I now need to categorically state who measures similar topics. However, the way the fuzzy match works means that some similar measures might be matched for some organizations and not others, that’s because there are differences in how the measures are worded.</SPAN>
In my example (see below), we’ve got one organization measuring ‘Banana supply’ and the fuzzy match shows that another 3 are measuring the same sort of thing. However, further down the list I’ve got an organization measuring ‘Supply of local bananas’ and the match against this measure has revealed that a further organization is measuring something along the same lines. So, if I wanted to state who has an interest in counting bananas I should be listing a total 5 organizations, I need to somehow amalgamate the two groups.</SPAN>
This is a very simple example. I’m guessing that I won’t be able to avoid doing a manual check at some point, but if anyone can give me any hints as to how I can cut hours out of this task I’d really, really appreciate it.</SPAN>
Let me know if this isn’t clear and I’ll try and clarify.
[TABLE="width: 636"]
<TBODY>[TR]
[TD]Organisation
[/TD]
[TD]Measure
[/TD]
[TD]Organisation
[/TD]
[TD]Similar Measure
[/TD]
[/TR]
[TR]
[TD]Org 1
[/TD]
[TD]Banana supply
[/TD]
[TD]Org 2
[/TD]
[TD]Supply of bananas
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 3
[/TD]
[TD]Banana Volume
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 4
[/TD]
[TD]The number of bananas
[/TD]
[/TR]
[TR]
[TD]Org 2
[/TD]
[TD]Supply of local bananas
[/TD]
[TD]Org 1
[/TD]
[TD]Banana supply
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 3
[/TD]
[TD]Banana Volume
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 4
[/TD]
[TD]The number of bananas
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 5
[/TD]
[TD]Supply of yellow bananas
[/TD]
[/TR]
</TBODY>[/TABLE]
</SPAN>
This is a tough one to explain, but I’ll do my best as I’ve not found anyone who can offer a solution.
I have a list of measures that a number of organizations are monitoring and I've been asked to find out which organizations are measuring similar things. There are thousands of measures, so I’ve used the fuzzy match plug-in to identify similarities in the text. I’ve set the threshold at 0.6.</SPAN>
The problem that I’ve now got is that I now need to categorically state who measures similar topics. However, the way the fuzzy match works means that some similar measures might be matched for some organizations and not others, that’s because there are differences in how the measures are worded.</SPAN>
In my example (see below), we’ve got one organization measuring ‘Banana supply’ and the fuzzy match shows that another 3 are measuring the same sort of thing. However, further down the list I’ve got an organization measuring ‘Supply of local bananas’ and the match against this measure has revealed that a further organization is measuring something along the same lines. So, if I wanted to state who has an interest in counting bananas I should be listing a total 5 organizations, I need to somehow amalgamate the two groups.</SPAN>
This is a very simple example. I’m guessing that I won’t be able to avoid doing a manual check at some point, but if anyone can give me any hints as to how I can cut hours out of this task I’d really, really appreciate it.</SPAN>
Let me know if this isn’t clear and I’ll try and clarify.
[TABLE="width: 636"]
<TBODY>[TR]
[TD]Organisation
[/TD]
[TD]Measure
[/TD]
[TD]Organisation
[/TD]
[TD]Similar Measure
[/TD]
[/TR]
[TR]
[TD]Org 1
[/TD]
[TD]Banana supply
[/TD]
[TD]Org 2
[/TD]
[TD]Supply of bananas
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 3
[/TD]
[TD]Banana Volume
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 4
[/TD]
[TD]The number of bananas
[/TD]
[/TR]
[TR]
[TD]Org 2
[/TD]
[TD]Supply of local bananas
[/TD]
[TD]Org 1
[/TD]
[TD]Banana supply
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 3
[/TD]
[TD]Banana Volume
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 4
[/TD]
[TD]The number of bananas
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD]Org 5
[/TD]
[TD]Supply of yellow bananas
[/TD]
[/TR]
</TBODY>[/TABLE]
</SPAN>