Hello,
I am working on a project that relies on 2 workers individual reviews of the same movie. There is a concern that some of the second reviews are too similar to the first because the second worker is somehow copying the first workers review. I have about 14 thousand entries to audit and i was hoping that there would be an easy way to compare the two fields and return a percentage of similarity based on words or phrases (preferably words) but not the order that they are in.
Since the thought is on the second set of notes are the copy, i would think it would be best to count the number of words in the second set and find how many of them are the same in the first set.
How i think it should display (text colors are only showing the comparison in the example)
[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD]Notes 1[/TD]
[TD]Notes 2[/TD]
[TD]Notes 2 similar to Notes 1 (%)[/TD]
[/TR]
[TR]
[TD]This movie is great[/TD]
[TD]This movie is great[/TD]
[TD]100%[/TD]
[/TR]
[TR]
[TD]This movie was a jam packed action movie[/TD]
[TD]this was a great action movie[/TD]
[TD](5 of 6) = 83%[/TD]
[/TR]
[TR]
[TD]Drama movie was intense with a full filling experience[/TD]
[TD]too long of a movie[/TD]
[TD](2 of 5) = 40%[/TD]
[/TR]
</tbody>[/TABLE]
4 out of 4 words from the second notes are the same as the first so it would be 100%
5 out of 6 words from the second notes are the same as the first so it would be 83%
2 out of 5 words from the second notes are the same as the first so it would be 40%
I hope this makes sense, please let me know if this is possible or if there is a better alternative.
Thank you for your time.
I am working on a project that relies on 2 workers individual reviews of the same movie. There is a concern that some of the second reviews are too similar to the first because the second worker is somehow copying the first workers review. I have about 14 thousand entries to audit and i was hoping that there would be an easy way to compare the two fields and return a percentage of similarity based on words or phrases (preferably words) but not the order that they are in.
Since the thought is on the second set of notes are the copy, i would think it would be best to count the number of words in the second set and find how many of them are the same in the first set.
How i think it should display (text colors are only showing the comparison in the example)
[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD]Notes 1[/TD]
[TD]Notes 2[/TD]
[TD]Notes 2 similar to Notes 1 (%)[/TD]
[/TR]
[TR]
[TD]This movie is great[/TD]
[TD]This movie is great[/TD]
[TD]100%[/TD]
[/TR]
[TR]
[TD]This movie was a jam packed action movie[/TD]
[TD]this was a great action movie[/TD]
[TD](5 of 6) = 83%[/TD]
[/TR]
[TR]
[TD]Drama movie was intense with a full filling experience[/TD]
[TD]too long of a movie[/TD]
[TD](2 of 5) = 40%[/TD]
[/TR]
</tbody>[/TABLE]
4 out of 4 words from the second notes are the same as the first so it would be 100%
5 out of 6 words from the second notes are the same as the first so it would be 83%
2 out of 5 words from the second notes are the same as the first so it would be 40%
I hope this makes sense, please let me know if this is possible or if there is a better alternative.
Thank you for your time.