Compare two strings, and find difference?

EtherBoo

New Member
Joined
Oct 26, 2006
Messages
37
This may not be possible with excel, but this is what I'm trying to do

A1 = Hello John, my name is Steve, how was your day?
B1 = Hello John, my name is Steve, was day?

I need C1 to return:
how your

Is this possible?
 
This compares two URLs folder-to-folder (items separated by forward slashes). This is similar to the WORDDIF function above.

I only tested it with the example you gave. Let me know how well it works.

Code:
[COLOR=darkblue]Function[/COLOR] URLDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], bJoin [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Boolean[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
        
    WordsA = Split(rngA.Text, "/")
    WordsB = Split(rngB.Text, "/")
        
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB) < ndxA [COLOR=darkblue]Then[/COLOR] [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxA), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
            WordsA(ndxA) = vbNullString
        [COLOR=darkblue]Else[/COLOR]
            bJoin = [COLOR=darkblue]True[/COLOR]
        [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    [COLOR=darkblue]If[/COLOR] bJoin [COLOR=darkblue]Then[/COLOR] URLDIF = Join(WordsA, "/")
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

Thanks!

I haven't messed with it yet, but when I initially plugged it into my excel the output was:

COL A (address):
https://www.domain.com/dir1/dir2/dir...?pageid=P02057

COL B (canonical):
https://www.domain.com/folder1/dir2/dir3/dir4/dir5/dir6/about-usDIFF (COL C, function output)///dir1//////about-us?pageid=P02057
 
Upvote 0

Excel Facts

Can you sort left to right?
To sort left-to-right, use the Sort dialog box. Click Options. Choose "Sort left to right"
Hi Alpha Frog.

I am comparing some data on types of differences: additions and deletions. I've used your code and simply switched the WORDDIF arguments accordingly.

However, I have come to some obstacles.

1. Sometimes both the addition and deletion cell will print the same name.
2. I want to delete the numbers with parentheticals "(1)" and "AC"
3. Ultimately it would be nice to print a new column with: additions (italicized) and deletions (strike through)

Here are some examples:
-------------------------
WHEN IT WORKS PERFECTLY!

B6=[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl64, width: 110"]Med Col of Georgia (1) (AC) Morehouse (1) (AC)
[/TD]
[/TR]
</tbody>[/TABLE]

D6=
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Med Col of Georgia (1) (AC) Meharry (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

G6 =WORDDIF(D6,B6)[TABLE="width: 70"]
<tbody>[TR]
[TD="class: xl66"]Meharry[/TD]
[/TR]
</tbody>[/TABLE]

H6 = =WORDDIF(B6,D6)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Morehouse[/TD]
[/TR]
</tbody>[/TABLE]
-------------------------

Obstacle 1: Sometimes both the addition and deletion cell will print the same name. (obstacle 2 embedded)

B29= [TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Indiana (1) (AC)
Loyola-Stritch (1) (AC) U Wash-Seattle (1) (AC)
UCLA/Drew Med Schl (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D29=[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Indiana (1) (AC)
Loyola-Stritch (1) (AC) UCLA/Drew Med Schl (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

G29=WORDDIF(D29,B29)
[TABLE="width: 70"]
<tbody>[TR]
[TD="class: xl66, width: 70"]UCLA/Drew[/TD]
[/TR]
</tbody>[/TABLE]

H29=WORDDIF(B29,D29)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]U Wash-Seattle (AC)
UCLA/Drew (1)[/TD]
[/TR]
</tbody>[/TABLE]
-------------------------

Here's a custom function called WORDDIF that may do what you want.

To install the custom function...
  • Alt+F11 to open the VBA Editor
  • From the VBA menu, select Insert\ Module
  • Paste the code below in the VBA Edit window

Back in Excel, put this formula in C1
=WORDDIF(A1,B1)
Code:
Function WORDDIF(rngA As Range, rngB As Range) As String
    
    Dim WordsA As Variant, WordsB As Variant
    Dim ndxA As Long, ndxB As Long, strTemp As String
        
    WordsA = Split(rngA.Text, " ")
    WordsB = Split(rngB.Text, " ")
    
    For ndxB = LBound(WordsB) To UBound(WordsB)
        For ndxA = LBound(WordsA) To UBound(WordsA)
            If StrComp(WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 Then
                WordsA(ndxA) = vbNullString
                Exit For
            End If
        Next ndxA
    Next ndxB
    
    For ndxA = LBound(WordsA) To UBound(WordsA)
        If WordsA(ndxA) <> vbNullString Then strTemp = strTemp & WordsA(ndxA) & " "
    Next ndxA
    
    WORDDIF = Trim(strTemp)

End Function
 
Upvote 0
However, I have come to some obstacles.

1. Sometimes both the addition and deletion cell will print the same name.
2. I want to delete the numbers with parentheticals "(1)" and "AC"
3. Ultimately it would be nice to print a new column with: additions (italicized) and deletions (strike through)

Based on your examples:
1. In the data, the difference in number of lines (linefeed characters) created words that "looked" the same but one included a hidden a linefeed. The code below strips linefeed characters.
2. Modified code excludes all parentheticals
3. That's just a column font format. It has nothing to do with the code.

Code:
[COLOR=darkblue]Function[/COLOR] WORDDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=green]'Remove linefeed characters and exclude parentheticals[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], ndxB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
            
    WordsA = Split(Replace(rngA.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    WordsB = Split(Replace(rngB.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    
    [COLOR=darkblue]For[/COLOR] ndxB = [COLOR=darkblue]LBound[/COLOR](WordsB) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB)
        [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
            [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
                WordsA(ndxA) = vbNullString
                [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
            [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
        [COLOR=darkblue]Next[/COLOR] ndxA
    [COLOR=darkblue]Next[/COLOR] ndxB
    
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=green]'Exclude parentheticals[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] WordsA(ndxA) [COLOR=darkblue]Like[/COLOR] "(*)" [COLOR=darkblue]Then[/COLOR] strTemp = strTemp & WordsA(ndxA) & " "
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    WORDDIF = Application.Trim(strTemp)
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]
 
Upvote 0
Hi AlphaFrog,

Once again cheers for the amazing code (I forgot to mention that last time).

The code looks great. I have a few changes I would like to make:

3. I meant that instead of having a deletions or additions column (which is great to visually see), I was wondering if there's a code that synthesizes the information, so that the output is: all the school names, with deletions (strike through), additions (italicized), and repeats (no change = normal).

4. Each school is separated (for the additions/deletions column). I think they just come as a list without any separation technique. So either each has its own line within the cell or separated by a column. Perhaps a comma could replace the double parenthetical.

Examples to go with the above:

3: Synthesized information:

B6=[TABLE="class: cms_table, width: 110"]
<tbody>[TR]
[TD="class: cms_table_xl64, width: 110"]Med Col of Georgia (1) (AC) Morehouse (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D6=
[TABLE="class: cms_table, width: 110"]
<tbody>[TR]
[TD="class: cms_table_xl66, width: 110"]Med Col of Georgia (1) (AC) Meharry (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

Deletions
G6 =WORDDIF(D6,B6)[TABLE="class: cms_table, width: 70"]
<tbody>[TR]
[TD="class: cms_table_xl66"]Meharry[/TD]
[/TR]
</tbody>[/TABLE]

Additions
H6 = =WORDDIF(B6,D6)
[TABLE="class: cms_table, width: 82"]
<tbody>[TR]
[TD="class: cms_table_xl66, width: 82"]Morehouse[/TD]
[/TR]
</tbody>[/TABLE]

I6 = {new code or something}

[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl64, width: 110"]Med Col of Georgia, Morehouse, Meharry*[/TD]
[/TR]
</tbody>[/TABLE]
*Note: I couldn't figure out how to strike through Meharry, so I underlined it.
Also note that there are commas separating the schools in the final list. Perhaps this could be done with replacing the set of parenthetical by commas.

4. Schools separated

B91 =
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Columbia (1) (AC)
Northwestern (1) (AC) Southern Calif (1) (AC)
U Chicago-Pritzker (1) (AC) U Michigan (1) (AC)
Wash U St Louis (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

D91 =
[TABLE="width: 110"]
<tbody>[TR]
[TD="class: xl66, width: 110"]Northwestern (1) (AC) Southern Calif (1) (AC)
U Chicago-Pritzker (1) (AC) U Michigan (1) (AC)[/TD]
[/TR]
</tbody>[/TABLE]

H91 ==WORDDIF(B91,D91)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia Wash U St Louis[/TD]
[/TR]
</tbody>[/TABLE]

{improved version}
H91 ==WORDDIF(B91,D91)
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia
Wash U St Louis[/TD]
[/TR]
</tbody>[/TABLE]
OR
[TABLE="width: 82"]
<tbody>[TR]
[TD="class: xl66, width: 82"]Columbia, Wash U St Louis
[/TD]
[/TR]
</tbody>[/TABLE]
Note the spaces in the first example and the comma in the second.

Cheers for all your assistance.

Also, I've done a bit of coding back in the day. While I am a bit rusty, it would be nice to learn how to code for Excel. If you have any pointers or resources, that would be helpful.

Based on your examples:
1. In the data, the difference in number of lines (linefeed characters) created words that "looked" the same but one included a hidden a linefeed. The code below strips linefeed characters.
2. Modified code excludes all parentheticals
3. That's just a column font format. It has nothing to do with the code.

Code:
[COLOR=darkblue]Function[/COLOR] WORDDIF(rngA [COLOR=darkblue]As[/COLOR] Range, rngB [COLOR=darkblue]As[/COLOR] Range) [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR]
    
    [COLOR=green]'Remove linefeed characters and exclude parentheticals[/COLOR]
    
    [COLOR=darkblue]Dim[/COLOR] WordsA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR], WordsB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] ndxA [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], ndxB [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR], strTemp As [COLOR=darkblue]String[/COLOR]
            
    WordsA = Split(Replace(rngA.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    WordsB = Split(Replace(rngB.Text, vbLf, " "), " ")  [COLOR=green]'remove linefeed[/COLOR]
    
    [COLOR=darkblue]For[/COLOR] ndxB = [COLOR=darkblue]LBound[/COLOR](WordsB) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsB)
        [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
            [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]StrComp[/COLOR](WordsA(ndxA), WordsB(ndxB), vbTextCompare) = 0 [COLOR=darkblue]Then[/COLOR]
                WordsA(ndxA) = vbNullString
                [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]For[/COLOR]
            [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
        [COLOR=darkblue]Next[/COLOR] ndxA
    [COLOR=darkblue]Next[/COLOR] ndxB
    
    [COLOR=darkblue]For[/COLOR] ndxA = [COLOR=darkblue]LBound[/COLOR](WordsA) [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](WordsA)
        [COLOR=green]'Exclude parentheticals[/COLOR]
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] WordsA(ndxA) [COLOR=darkblue]Like[/COLOR] "(*)" [COLOR=darkblue]Then[/COLOR] strTemp = strTemp & WordsA(ndxA) & " "
    [COLOR=darkblue]Next[/COLOR] ndxA
    
    WORDDIF = Application.Trim(strTemp)
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]
 
Upvote 0
3. A User Defined Function (UDF) like this cannot affect the text formatting of its result. A macro could be created to compare two columns of data and output a column of formatted multi-line results, but that would go beyond the scope of this thread which is centered around the WORDDIF function. It would be best if you started a new thread for a request like that.

4. The WORDDIF function is designed to do a word-to-word comparison. Your data is more of a multi-word (school) phrase-to-phrase comparison. Based on how this function is structured, getting the school separator to go in the right place for all scenarios is more involved than just replacing the Parentheticals with a separator. It can be done, but it would be better accomplished with a whole new function (or macro as suggested in #3) that's designed to compare phases or multi-word schools specific to your data. Again, beyond the scope of this thread.

Training / Books / Sites as of 2/12/2016
 
Upvote 0
Hi AlphaFrog,

Do you have a code that can spit out the similarities?

3. A User Defined Function (UDF) like this cannot affect the text formatting of its result. A macro could be created to compare two columns of data and output a column of formatted multi-line results, but that would go beyond the scope of this thread which is centered around the WORDDIF function. It would be best if you started a new thread for a request like that.

4. The WORDDIF function is designed to do a word-to-word comparison. Your data is more of a multi-word (school) phrase-to-phrase comparison. Based on how this function is structured, getting the school separator to go in the right place for all scenarios is more involved than just replacing the Parentheticals with a separator. It can be done, but it would be better accomplished with a whole new function (or macro as suggested in #3) that's designed to compare phases or multi-word schools specific to your data. Again, beyond the scope of this thread.

Training / Books / Sites as of 2/12/2016
 
Upvote 0
Dear AlphaFrog,

My apologies for bringing this topic also back to life, but I have got a question that is related to this topic.

Down below you will see two pictures; one of the old month, one of the current month. Each one is from a different workbook. The format/structure is standard and doesn't change whatsoever, but I would like to compare these two together and list the files of différences.

Previous month
xofqlk.png


Current month
29fyow8.png


There are in total 64 combinations (8 versions and 8 models). Each "MATCH" represents 8 of these combinations. While comparing, I would only like to have the new data that is different in comparison to the old one. If there is a difference, I would like to know from which "MATCH" this is (1 to 8). Ideally it should be something like this:


n69p2g.png



I hope this encompasses everything what you should know. If there are any things unclear, please let me know!


Many thanks in advance,

Djani
 
Upvote 0
Hi,

For my purposes, the code given earlier works well.

Function CHARDIF(rngA As Range, rngB As Range) As String

Dim strA As String, strB As String
Dim i As Long, strTemp As String

strA = rngA.Value
strB = rngB.Value
If Len(strA) > Len(strB) Then
strTemp = strA
strA = strB
strB = strTemp
strTemp = ""
End If

For i = 1 To Len(strB)
If i > Len(strA) Then
strTemp = strTemp & Mid(strB, i, 1)
ElseIf UCase(Mid(strA, i, 1)) <> UCase(Mid(strB, i, 1)) Then
strTemp = strTemp & Mid(strB, i, 1)
End If
Next i

CHARDIF = strTemp

End Function


However, I need a simple way to find the difference from two strings no matter of their position.
For example,

if I have A1=ABC and B2=ABCDEF, then the above code will give DEF. It will give me DEF even if A1=ABCF.

What I'd like to see is that, if A1=HI MISS SMITH AND B1= HI MS SMITH, then I'd like to see C1=IS (which is the part that differs between these two strings.
I don't know if that helps, but I am looking for a way to compare two strings and see that their differences is either a space " " or a comma ",". After searching on the internet, the above code was the closest thing that I could find.

Any help will be great!
 
Upvote 0
I don't know if that helps, but I am looking for a way to compare two strings and see that their differences is either a space " " or a comma ",".

If space\comma differences is all you're looking for, a formula could count spaces\commas in both strings and return the count differences if any. Otherwise, can you elaborate?
 
Upvote 0
It wouldn't work in my case...I am trying to clean some entries and I'll need to make various checks.


I applied two functions.
a) =LEN(A1)-LEN(SUBSTITUTE(B2," ",""))
b) =LEN(G2)-LEN(H2)


I filtered both when value above 0, but it would only work if the characters are the same.


For example, let say that I have one case as "Peter Smith" and the second as "peter Smiith". Here, we have a difference of an extra space and an "i".
However, I need to filter cases when both strings have the same characters except the space. For example "Peter Smith" (e.g. so, here we have a difference of an extra space only).


The previous VBA code was working for me to some extent, but I need this to be modified so that it can give me only the part that both strings differ.
Any ideas would be great!


Thanks in advance!
 
Upvote 0

Forum statistics

Threads
1,223,923
Messages
6,175,404
Members
452,640
Latest member
steveridge

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top