Compare two strings, and find difference?

EtherBoo · Aug 6, 2010

This may not be possible with excel, but this is what I'm trying to do

A1 = Hello John, my name is Steve, how was your day?
B1 = Hello John, my name is Steve, was day?

I need C1 to return:
how your

Is this possible?

oneglory · Feb 12, 2016

AlphaFrog said:

This compares two URLs folder-to-folder (items separated by forward slashes). This is similar to the WORDDIF function above.

I only tested it with the example you gave. Let me know how well it works.

Code:

[COLOR=darkblue]Function[/COLOR] URLDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], bJoin [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Boolean[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
        
    WordsA = Split(rngA.Text, "/")
    WordsB = Split(rngB.Text, "/")
        
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB) < ndxA [COLOR=darkblue]Then[/COLOR] [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxA), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
            WordsA(ndxA) = vbNullString
        [COLOR=darkblue]Else[/COLOR]
            bJoin = [COLOR=darkblue]True[/COLOR]
        [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    [COLOR=darkblue]If[/COLOR] bJoin [COLOR=darkblue]Then[/COLOR] URLDIF = Join(WordsA, "/")
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

Thanks!

I haven't messed with it yet, but when I initially plugged it into my excel the output was:

COL A (address):
https://www.domain.com/dir1/dir2/dir...?pageid=P02057

COL B (canonical):
https://www.domain.com/folder1/dir2/dir3/dir4/dir5/dir6/about-usDIFF (COL C, function output)///dir1//////about-us?pageid=P02057

cheshire · Feb 18, 2016

Hi Alpha Frog.

I am comparing some data on types of differences: additions and deletions. I've used your code and simply switched the WORDDIF arguments accordingly.

However, I have come to some obstacles.

1. Sometimes both the addition and deletion cell will print the same name.
2. I want to delete the numbers with parentheticals "(1)" and "AC"
3. Ultimately it would be nice to print a new column with: additions (italicized) and deletions (strike through)

Here are some examples:
-------------------------
WHEN IT WORKS PERFECTLY!

B6=[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl64, width: 110"]Med Col of Georgia (1) (AC) Morehouse (1) (AC)
[/TD]
[/TR]
</tbody>[/TABLE]

D6=
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Med Col of Georgia (1) (AC) Meharry (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

G6 =WORDDIF(D6,B6)[TABLE="width: 70"]
<tbody>[TR]
[TD="class: xl66"]Meharry[/TD]
[/TR]
</tbody>[/TABLE]

H6 = =WORDDIF(B6,D6)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Morehouse[/TD]
[/TR]
</tbody>[/TABLE]
-------------------------

Obstacle 1: Sometimes both the addition and deletion cell will print the same name. (obstacle 2 embedded)

B29= [TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Indiana (1) (AC)
Loyola-Stritch (1) (AC) U Wash-Seattle (1) (AC)
UCLA/Drew Med Schl (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D29=[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Indiana (1) (AC)
Loyola-Stritch (1) (AC) UCLA/Drew Med Schl (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

G29=WORDDIF(D29,B29)
[TABLE="width: 70"]
<tbody>[TR]
[TD="class: xl66, width: 70"]UCLA/Drew[/TD]
[/TR]
</tbody>[/TABLE]

H29=WORDDIF(B29,D29)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]U Wash-Seattle (AC)
UCLA/Drew (1)[/TD]
[/TR]
</tbody>[/TABLE]
-------------------------

AlphaFrog said:

Here's a custom function called WORDDIF that may do what you want.

To install the custom function...

Alt+F11 to open the VBA Editor
From the VBA menu, select Insert\ Module
Paste the code below in the VBA Edit window

Back in Excel, put this formula in C1
=WORDDIF(A1,B1)

Code:

Function WORDDIF(rngA As Range, rngB As Range) As String
    
    Dim WordsA As Variant, WordsB As Variant
    Dim ndxA As Long, ndxB As Long, strTemp As String
        
    WordsA = Split(rngA.Text, " ")
    WordsB = Split(rngB.Text, " ")
    
    For ndxB = LBound(WordsB) To UBound(WordsB)
        For ndxA = LBound(WordsA) To UBound(WordsA)
            If StrComp(WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 Then
                WordsA(ndxA) = vbNullString
                Exit For
            End If
        Next ndxA
    Next ndxB
    
    For ndxA = LBound(WordsA) To UBound(WordsA)
        If WordsA(ndxA) <> vbNullString Then strTemp = strTemp & WordsA(ndxA) & " "
    Next ndxA
    
    WORDDIF = Trim(strTemp)

End Function

AlphaFrog · Feb 18, 2016

cheshire said:
However, I have come to some obstacles.

1. Sometimes both the addition and deletion cell will print the same name.
2. I want to delete the numbers with parentheticals "(1)" and "AC"
3. Ultimately it would be nice to print a new column with: additions (italicized) and deletions (strike through)

Based on your examples:
1. In the data, the difference in number of lines (linefeed characters) created words that "looked" the same but one included a hidden a linefeed. The code below strips linefeed characters.
2. Modified code excludes all parentheticals
3. That's just a column font format. It has nothing to do with the code.

Code:

[COLOR=darkblue]Function[/COLOR] WORDDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=green]'Remove linefeed characters and exclude parentheticals[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], ndxB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
            
    WordsA = Split(Replace(rngA.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    WordsB = Split(Replace(rngB.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    
    [COLOR=darkblue]For[/COLOR] ndxB = [COLOR=darkblue]LBound[/COLOR](WordsB) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB)
        [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
            [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
                WordsA(ndxA) = vbNullString
                [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
            [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
        [COLOR=darkblue]Next[/COLOR] ndxA
    [COLOR=darkblue]Next[/COLOR] ndxB
    
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=green]'Exclude parentheticals[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] WordsA(ndxA) [COLOR=darkblue]Like[/COLOR] "(*)" [COLOR=darkblue]Then[/COLOR] strTemp = strTemp & WordsA(ndxA) & " "
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    WORDDIF = Application.Trim(strTemp)
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

cheshire · Feb 19, 2016

Hi AlphaFrog,

Once again cheers for the amazing code (I forgot to mention that last time).

The code looks great. I have a few changes I would like to make:

3. I meant that instead of having a deletions or additions column (which is great to visually see), I was wondering if there's a code that synthesizes the information, so that the output is: all the school names, with deletions (strike through), additions (italicized), and repeats (no change = normal).

4. Each school is separated (for the additions/deletions column). I think they just come as a list without any separation technique. So either each has its own line within the cell or separated by a column. Perhaps a comma could replace the double parenthetical.

Examples to go with the above:

3: Synthesized information:

B6=[TABLE="class: cms_table, width: 110"]
<tbody>[TR]
[TD="class: cms_table_xl64, width: 110"]Med Col of Georgia (1) (AC) Morehouse (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D6=
[TABLE="class: cms_table, width: 110"]
<tbody>[TR]
[TD="class: cms_table_xl66, width: 110"]Med Col of Georgia (1) (AC) Meharry (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

Deletions
G6 =WORDDIF(D6,B6)[TABLE="class: cms_table, width: 70"]
<tbody>[TR]
[TD="class: cms_table_xl66"]Meharry[/TD]
[/TR]
</tbody>[/TABLE]

Additions
H6 = =WORDDIF(B6,D6)
[TABLE="class: cms_table, width: 82"]
<tbody>[TR]
[TD="class: cms_table_xl66, width: 82"]Morehouse[/TD]
[/TR]
</tbody>[/TABLE]

I6 = {new code or something}
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl64, width: 110"]Med Col of Georgia, Morehouse, Meharry*[/TD]
[/TR]
</tbody>[/TABLE]
*Note: I couldn't figure out how to strike through Meharry, so I underlined it.
Also note that there are commas separating the schools in the final list. Perhaps this could be done with replacing the set of parenthetical by commas.

4. Schools separated

B91 =
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Columbia (1) (AC)
Northwestern (1) (AC) Southern Calif (1) (AC)
U Chicago-Pritzker (1) (AC) U Michigan (1) (AC)
Wash U St Louis (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D91 =
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Northwestern (1) (AC) Southern Calif (1) (AC)
U Chicago-Pritzker (1) (AC) U Michigan (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

H91 ==WORDDIF(B91,D91)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia Wash U St Louis[/TD]
[/TR]
</tbody>[/TABLE]

{improved version}
H91 ==WORDDIF(B91,D91)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia
Wash U St Louis[/TD]
[/TR]
</tbody>[/TABLE]
OR
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia, Wash U St Louis
[/TD]
[/TR]
</tbody>[/TABLE]
Note the spaces in the first example and the comma in the second.

Cheers for all your assistance.

Also, I've done a bit of coding back in the day. While I am a bit rusty, it would be nice to learn how to code for Excel. If you have any pointers or resources, that would be helpful.

AlphaFrog said:

Based on your examples:
1. In the data, the difference in number of lines (linefeed characters) created words that "looked" the same but one included a hidden a linefeed. The code below strips linefeed characters.
2. Modified code excludes all parentheticals
3. That's just a column font format. It has nothing to do with the code.

Code:

[COLOR=darkblue]Function[/COLOR] WORDDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=green]'Remove linefeed characters and exclude parentheticals[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], ndxB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
            
    WordsA = Split(Replace(rngA.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    WordsB = Split(Replace(rngB.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    
    [COLOR=darkblue]For[/COLOR] ndxB = [COLOR=darkblue]LBound[/COLOR](WordsB) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB)
        [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
            [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
                WordsA(ndxA) = vbNullString
                [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
            [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
        [COLOR=darkblue]Next[/COLOR] ndxA
    [COLOR=darkblue]Next[/COLOR] ndxB
    
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=green]'Exclude parentheticals[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] WordsA(ndxA) [COLOR=darkblue]Like[/COLOR] "(*)" [COLOR=darkblue]Then[/COLOR] strTemp = strTemp & WordsA(ndxA) & " "
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    WORDDIF = Application.Trim(strTemp)
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

AlphaFrog · Feb 19, 2016

3. A User Defined Function (UDF) like this cannot affect the text formatting of its result. A macro could be created to compare two columns of data and output a column of formatted multi-line results, but that would go beyond the scope of this thread which is centered around the WORDDIF function. It would be best if you started a new thread for a request like that.

4. The WORDDIF function is designed to do a word-to-word comparison. Your data is more of a multi-word (school) phrase-to-phrase comparison. Based on how this function is structured, getting the school separator to go in the right place for all scenarios is more involved than just replacing the Parentheticals with a separator. It can be done, but it would be better accomplished with a whole new function (or macro as suggested in #3) that's designed to compare phases or multi-word schools specific to your data. Again, beyond the scope of this thread.

Training / Books / Sites as of 2/12/2016

cheshire · Feb 25, 2016

Hi AlphaFrog,

Do you have a code that can spit out the similarities?

AlphaFrog said:
3. A User Defined Function (UDF) like this cannot affect the text formatting of its result. A macro could be created to compare two columns of data and output a column of formatted multi-line results, but that would go beyond the scope of this thread which is centered around the WORDDIF function. It would be best if you started a new thread for a request like that.

4. The WORDDIF function is designed to do a word-to-word comparison. Your data is more of a multi-word (school) phrase-to-phrase comparison. Based on how this function is structured, getting the school separator to go in the right place for all scenarios is more involved than just replacing the Parentheticals with a separator. It can be done, but it would be better accomplished with a whole new function (or macro as suggested in #3) that's designed to compare phases or multi-word schools specific to your data. Again, beyond the scope of this thread.

Training / Books / Sites as of 2/12/2016

Djani · Mar 14, 2016

Dear AlphaFrog,

My apologies for bringing this topic also back to life, but I have got a question that is related to this topic.

Down below you will see two pictures; one of the old month, one of the current month. Each one is from a different workbook. The format/structure is standard and doesn't change whatsoever, but I would like to compare these two together and list the files of différences.

Previous month

Current month

There are in total 64 combinations (8 versions and 8 models). Each "MATCH" represents 8 of these combinations. While comparing, I would only like to have the new data that is different in comparison to the old one. If there is a difference, I would like to know from which "MATCH" this is (1 to 8). Ideally it should be something like this:

I hope this encompasses everything what you should know. If there are any things unclear, please let me know!

Many thanks in advance,

Djani

jpaokx · Mar 17, 2016

Hi,

For my purposes, the code given earlier works well.

Function CHARDIF(rngA As Range, rngB As Range) As String

Dim strA As String, strB As String
Dim i As Long, strTemp As String

strA = rngA.Value
strB = rngB.Value
If Len(strA) > Len(strB) Then
strTemp = strA
strA = strB
strB = strTemp
strTemp = ""
End If

For i = 1 To Len(strB)
If i > Len(strA) Then
strTemp = strTemp & Mid(strB, i, 1)
ElseIf UCase(Mid(strA, i, 1)) <> UCase(Mid(strB, i, 1)) Then
strTemp = strTemp & Mid(strB, i, 1)
End If
Next i

CHARDIF = strTemp

End Function

However, I need a simple way to find the difference from two strings no matter of their position.
For example,

if I have A1=ABC and B2=ABCDEF, then the above code will give DEF. It will give me DEF even if A1=ABCF.

What I'd like to see is that, if A1=HI MISS SMITH AND B1= HI MS SMITH, then I'd like to see C1=IS (which is the part that differs between these two strings.
I don't know if that helps, but I am looking for a way to compare two strings and see that their differences is either a space " " or a comma ",". After searching on the internet, the above code was the closest thing that I could find.

Any help will be great!

AlphaFrog · Mar 17, 2016

jpaokx said:
I don't know if that helps, but I am looking for a way to compare two strings and see that their differences is either a space " " or a comma ",".

If space\comma differences is all you're looking for, a formula could count spaces\commas in both strings and return the count differences if any. Otherwise, can you elaborate?

jpaokx · Mar 19, 2016

It wouldn't work in my case...I am trying to clean some entries and I'll need to make various checks.

I applied two functions.
a) =LEN(A1)-LEN(SUBSTITUTE(B2," ",""))
b) =LEN(G2)-LEN(H2)

I filtered both when value above 0, but it would only work if the characters are the same.

For example, let say that I have one case as "Peter Smith" and the second as "peter Smiith". Here, we have a difference of an extra space and an "i".
However, I need to filter cases when both strings have the same characters except the space. For example "Peter Smith" (e.g. so, here we have a difference of an extra space only).

The previous VBA code was working for me to some extent, but I need this to be modified so that it can give me only the part that both strings differ.
Any ideas would be great!

Thanks in advance!

Compare two strings, and find difference?

EtherBoo

New Member

oneglory

New Member

cheshire

New Member

AlphaFrog

MrExcel MVP

cheshire

New Member

AlphaFrog

MrExcel MVP

cheshire

New Member

Djani

Board Regular

jpaokx

New Member

AlphaFrog

MrExcel MVP

jpaokx

New Member

Similar threads

Share this page

Compare two strings, and find difference?

New Member

New Member

New Member

MrExcel MVP

New Member

MrExcel MVP

New Member

Board Regular

New Member

MrExcel MVP

New Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock