Remove a text string in column B based on column C

Pete2020

Board Regular
Joined
Apr 25, 2020
Messages
68
Office Version
  1. 2016
Platform
  1. Windows
I am very thankful to the Mr. Excel forum in providing right solutions.

my data consists 70k rows and would be helpful if you Provide a solution through VBA
  1. Match every comma separated string of column C with Column B at each row level (C2 with B2)
  2. If found - it should be remove the word along with Space and special characters or comma infront of it.
  3. The output should be populated in Column D
Search and remove.xlsx
ABCD
1School IDClient NameLocation - match words and remove from column BExpected Output
2SC-001Bachpan Play School, LoharaMIDC Area,Darwha Road, Lohara, YavatmalBachpan Play Schoo1
3SC-002Citizen Technical Institute WashimWashim, Maharashtra, IndiaCitizen Technical Institute
4SC-003Absolute Spoken English, wardha, maharashtra, indiaHindinagar, Wardha, Maharashtra, IndiaAbsolute Spoken English
5SC-004Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, IndiaKasturba Nursing college
6SC-005CP Goenka International School, Oshiwara | Andheri WestOshiwara, Andheri West, Mumbai, MaharashtraCP Goenka International School
7SC-006Sri Sri Ravishankar Vidya Mandir (SSRVM), Mulund, MumbaiSarvodaya Nagar, Mulund West, Mumbai, MaharashtraSri Sri Ravishankar Vidya Mandir (SSRVM)
8SC-007Saleem Zakerya Urdu High School PusegaonPusegaon Main RdSaleem Zakerya Urdu High School
Sheet1
 

Excel Facts

Can you AutoAverage in Excel?
There is a drop-down next to the AutoSum symbol. Open the drop-down to choose AVERAGE, COUNT, MAX, or MIN
Hi Pete2020

Since you have 2016 version, you might consider a Power Query alternative.
  1. Data - Get & Transform Data -> From Table/Range
  2. Add 3 Custom Columns using the following formulas
    1. "List Client Names" = Text.SplitAny(Text.Proper([Client Name]), ", |;")
    2. "List Location" = Text.SplitAny(Text.Proper([Location]), ", |;")
    3. "Remaining" = List.RemoveItems([List Client Names],[List Location])
  3. Extract the "Remaining" by clicking the arrows pointing up
  4. Delete the helper columns
  5. Save and load to Excel
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    ListOfName = Table.AddColumn(Source, "List Client Names", each Text.SplitAny(Text.Proper([Client Name]), ", |;")),
    ListOfLocation = Table.AddColumn(ListOfName, "List Location", each Text.SplitAny(Text.Proper([Location]), ",  |;")),
    DifferenceOfLists = Table.AddColumn(ListOfLocation, "Remaining", each List.RemoveItems([List Client Names],[List Location])),
    ExtractRemaining = Table.TransformColumns(DifferenceOfLists, {"Remaining", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
    RemoveHelperListColumns = Table.RemoveColumns(ExtractRemaining,{"List Client Names", "List Location"})
in
    RemoveHelperListColumns

Book2
ABCD
1School IDClient NameLocationExpected Output
2SC-001Bachpan Play School, LoharaMIDC Area,Darwha Road, Lohara, YavatmalBachpan Play Schoo1
3SC-002Citizen Technical Institute WashimWashim, Maharashtra, IndiaCitizen Technical Institute
4SC-003Absolute Spoken English, wardha, maharashtra, indiaHindinagar, Wardha, Maharashtra, IndiaAbsolute Spoken English
5SC-004Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, IndiaKasturba Nursing college
6SC-005CP Goenka International School, Oshiwara | Andheri WestOshiwara, Andheri West, Mumbai, MaharashtraCP Goenka International School
7SC-006Sri Sri Ravishankar Vidya Mandir (SSRVM), Mulund, MumbaiSarvodaya Nagar, Mulund West, Mumbai, MaharashtraSri Sri Ravishankar Vidya Mandir (SSRVM)
8SC-007Saleem Zakerya Urdu High School PusegaonPusegaon Main RdSaleem Zakerya Urdu High School
9
10
11
12
13School IDClient NameLocationRemaining
14SC-001Bachpan Play School, LoharaMIDC Area,Darwha Road, Lohara, YavatmalBachpan Play School
15SC-002Citizen Technical Institute WashimWashim, Maharashtra, IndiaCitizen Technical Institute
16SC-003Absolute Spoken English, wardha, maharashtra, indiaHindinagar, Wardha, Maharashtra, IndiaAbsolute Spoken English
17SC-004Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, IndiaKasturba Nursing College Khs
18SC-005CP Goenka International School, Oshiwara | Andheri WestOshiwara, Andheri West, Mumbai, MaharashtraCp Goenka International School
19SC-006Sri Sri Ravishankar Vidya Mandir (SSRVM), Mulund, MumbaiSarvodaya Nagar, Mulund West, Mumbai, MaharashtraSri Sri Ravishankar Vidya Mandir (Ssrvm)
20SC-007Saleem Zakerya Urdu High School PusegaonPusegaon Main RdSaleem Zakerya Urdu High School
Sheet3
 
Upvote 0
Thank You for giving a solution through Power Query.I am not much comfortable.

Can some one help me out in giving a VBA for my Problem mentioned above
 
Upvote 0
Hi,
Check that and let me know it that's what you need.

VBA Code:
Sub ReplaceValues()
    Dim lastRow&
    Dim arr() As String
    Dim i&
    Dim el
    Dim str$
    Dim str2split$
    
    
    lastRow = Range("A" & Cells.Rows.Count).End(xlUp).Row
    
    For i = 2 To lastRow
        str = vbNullString
        str = CStr(Cells(i, "B"))
        
        Cells(i, "D") = str
        
        str2split = CStr(Cells(i, "C"))
        arr() = Split(str2split, ",")
        
        For Each el In arr
            str = Replace(str, ",", "", 1)
            str = Replace(str, " |", "", 1)
            str = Replace(str, el, "", 1)
        Next el
        
        Cells(i, "D") = str
    
    Next i
    
    MsgBox "Done"
    
End Sub
 
Upvote 0
Looks awesome, It exactly fits my requirement and also macro is quite understandable to edit as per the requirements.

Thanks You So Much
 
Upvote 0
Match every comma separated string of column C with Column B at each row level (C2 with B2)
Then in your expected results ..
  1. Why has 'KHS' been removed from row 5 when it is not listed in column C?
  2. Why has 'Muland' been removed in row 7 as the comma separated value in column C is 'Muland West' not 'Muland'?
  3. Why has anything been removed from row 8. There are no commas in C8 so the value to search for would be 'Pusegaon Main Rd' which does no appear in B8?

Are you sure that the suggested code does what you want? I ran the code with this sample data but I don't think that is the result you would want?
In the first one, two 'part-words' are removed and the remaining 'part-words' are joined.
In the second one, Kasturba is not removed despite being in the column C list.

Pete2020 2020-05-06 1.xlsm
BCD
9Indian Western High School IndiaIndia, Westnern High School
10Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, India, KasturbaKasturba Nursing college KHS
Sheet1 (2)
 
Upvote 0
Thanks Peter.I Just had a look. It looks that macro required a tweak.

Please help me fixing the issue.
 
Upvote 0
Please help me fixing the issue.
Well, what was the response to my three numbered questions?
Were they errors with your expected results or were the requirements not specified correctly?
 
Upvote 0
hen in your expected results ..
  1. Why has 'KHS' been removed from row 5 when it is not listed in column C?
  2. Why has 'Muland' been removed in row 7 as the comma separated value in column C is 'Muland West' not 'Muland'?
  3. Why has anything been removed from row 8. There are no commas in C8 so the value to search for would be 'Pusegaon Main Rd' which does no appear in B8?
Answers to Above
1. KHS should not be removed
2. Muland should not be removed, The macro should consider 'Muland West" as single word.
3. Nothing to be removed from C8 as there is no comma

In the first one, two 'part-words' are removed and the remaining 'part-words' are joined. - I agree
In the second one, Kasturba is not removed despite being in the column C list. - I agree

You are right - Word match should be based on comma.It should also consider spaces within a word for exact match against column B

Thanks Peter for your Great Observations
 
Upvote 0
See how this goes.

VBA Code:
Sub Remove_Unwanted()
  Dim RX As Object
  Dim a As Variant
  Dim i As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  a = Range("B2", Range("C" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    RX.Pattern = "[^A-Za-z\)]*\b(" & Replace(Replace(a(i, 2), ", ", ","), ",", "|") & ")\b[^A-Za-z\(]*"
    a(i, 1) = RX.Replace(a(i, 1), "")
  Next i
  Range("D2").Resize(UBound(a)).Value = a
End Sub

My sample data and results.
Pete2020 2020-05-06 1.xlsm
BCD
1Client NameLocation - match words and remove from column BOutput
2Bachpan Play School, LoharaMIDC Area,Darwha Road, Lohara, YavatmalBachpan Play School
3Citizen Technical Institute WashimWashim, Maharashtra, IndiaCitizen Technical Institute
4Absolute Spoken English, wardha, maharashtra, indiaHindinagar, Wardha, Maharashtra, IndiaAbsolute Spoken English
5Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, IndiaKasturba Nursing college, KHS
6CP Goenka International School, Oshiwara | Andheri WestOshiwara, Andheri West, Mumbai, MaharashtraCP Goenka International School
7Sri Sri Ravishankar Vidya Mandir (SSRVM), Mulund, MumbaiSarvodaya Nagar, Mulund, Mumbai, MaharashtraSri Sri Ravishankar Vidya Mandir (SSRVM)
8Saleem Zakerya Urdu High School PusegaonPusegaon Main RdSaleem Zakerya Urdu High School Pusegaon
9Indian Western High School IndiaIndia, WestIndian Western High School
10Kasturba Nursing college, KHS, SevagramSevagram, Maharashtra, India, KasturbaNursing college, KHS
Sheet1 (2)
 
Upvote 0

Forum statistics

Threads
1,223,958
Messages
6,175,642
Members
452,663
Latest member
MEMEH

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top