A different kind of compare

jayhouse

New Member
Joined
Jan 2, 2017
Messages
13
Hello,

I am working on a project that relies on 2 workers individual reviews of the same movie. There is a concern that some of the second reviews are too similar to the first because the second worker is somehow copying the first workers review. I have about 14 thousand entries to audit and i was hoping that there would be an easy way to compare the two fields and return a percentage of similarity based on words or phrases (preferably words) but not the order that they are in.
Since the thought is on the second set of notes are the copy, i would think it would be best to count the number of words in the second set and find how many of them are the same in the first set.


How i think it should display (text colors are only showing the comparison in the example)

[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD]Notes 1[/TD]
[TD]Notes 2[/TD]
[TD]Notes 2 similar to Notes 1 (%)[/TD]
[/TR]
[TR]
[TD]This movie is great[/TD]
[TD]This movie is great[/TD]
[TD]100%[/TD]
[/TR]
[TR]
[TD]This movie was a jam packed action movie[/TD]
[TD]this was a great action movie[/TD]
[TD](5 of 6) = 83%[/TD]
[/TR]
[TR]
[TD]Drama movie was intense with a full filling experience[/TD]
[TD]too long of a movie[/TD]
[TD](2 of 5) = 40%[/TD]
[/TR]
</tbody>[/TABLE]
4 out of 4 words from the second notes are the same as the first so it would be 100%

5 out of 6 words from the second notes are the same as the first so it would be 83%

2 out of 5 words from the second notes are the same as the first so it would be 40%




I hope this makes sense, please let me know if this is possible or if there is a better alternative.

Thank you for your time.
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Try the following code. This will use Column A as your "Notes 1" column, Column B as your "Notes 2" column, and return the results in column C.

Code:
Public Sub CompareReviews()
Dim LR          As Long, _
    i           As Long
    
Dim ary1        As Variant, _
    ary2        As Variant, _
    aryitem     As Variant

Dim aryCompare  As Variant, _
    cntCompare  As Long
    
Application.ScreenUpdating = False
    
LR = Range("A" & Rows.Count).End(xlUp).Row
For i = 2 To LR
    ary1 = Split(Range("A" & i).Value, " ")
    ary2 = Split(Range("B" & i).Value, " ")
    For Each aryitem In ary2
        aryCompare = Application.Match(aryitem, ary1, 0)
        If Not IsError(aryCompare) Then
            'Value found
            cntCompare = cntCompare + 1
        End If
    Next aryitem
    Range("C" & i).Value = cntCompare / (UBound(ary2) + 1)
    Erase ary1
    Erase ary2
    cntCompare = 0
Next i

Application.ScreenUpdating = True

End Sub

Using your sample data, I got the following results:

Excel 2013/2016
ABC
1Notes 1Notes 2Result
2This movie is greatThis movie is great100.00%
3This movie was a jam packed action moviethis was a great action movie83.33%
4Drama movie was intense with a full filling experiencetoo long of a movie40.00%
Sheet1
 
Last edited:
Upvote 0
Thank you for responding Mr Kowz, also thank you for writing the macro. I am less than a novice on them but i do know how to implement them.

It does work for the first 7 or so entries but then i am prompted with "Run-time error '6': Overflow".
Selecting Debug takes me to the highlighted portion

Code:
Range("C" & i).Value = cntCompare / (UBound(ary2) + 1)

What should i do?
 
Upvote 0
When the code errors and highlights that line, what values are being returned for cntCompare and UBound(ary2)? You should be able to see this by hovering these words.
 
Upvote 0
It came across an entry where there was no second review. Try this adjusted code - this will skip rows where column A or column B is empty.

Code:
Public Sub CompareReviews()
Dim LR          As Long, _
    i           As Long
    
Dim ary1        As Variant, _
    ary2        As Variant, _
    aryitem     As Variant

Dim aryCompare  As Variant, _
    cntCompare  As Long
    
Application.ScreenUpdating = False
    
LR = Range("A" & Rows.Count).End(xlUp).Row
For i = 2 To LR
    If Range("A" & i).Value <> "" And Range("B" & i).Value <> "" Then
        ary1 = Split(Range("A" & i).Value, " ")
        ary2 = Split(Range("B" & i).Value, " ")
        For Each aryitem In ary2
            aryCompare = Application.Match(aryitem, ary1, 0)
            If Not IsError(aryCompare) Then
                'Value found
                cntCompare = cntCompare + 1
            End If
        Next aryitem
        Range("C" & i).Value = cntCompare / (UBound(ary2) + 1)
        Erase ary1
        Erase ary2
        cntCompare = 0
    End If
Next i

Application.ScreenUpdating = True

End Sub
 
Upvote 0
Nevermind, i found the error. There were some blank entries for the notes.
When i remove these, then the macro works!

Thank you for your time and help!
 
Upvote 0
Nevermind, i found the error. There were some blank entries for the notes.
When i remove these, then the macro works!

Thank you for your time and help!

Wonderful - thanks for the feedback. Have a good day!
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,182
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top