VBA to extract all words that start with (CB and end with )

sjinvestigator

New Member
Joined
Aug 11, 2021
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi All,

How would one go about extracting all text that starts with "(CB" and ends with ")" from a column of cells with a bunch of text and paste the result in the next column? For example, if I have this in A2:

* Margarine Wholesale (27), Margarine Retail (241), Margarine Retail (287), Margarine Bulk (1781) (CB28412)
* Margarine Wholesale (281), Margarine Retail (42), Margarine Retail (13), Margarine Bulk (27) (CB28412)
* Margarine Wholesale (19), Margarine Retail (281), Margarine Retail (411), Margarine Bulk (3814) (CB82474)

I'd like to update B2 with:
(CB28412),(CB82474)

...excluding all other text found in A2. Repeat for as many rows in column A as there is data. I'd also like it to avoid duplicates if the same (CBxxxx number appears more than once in the string.

I found an excelent post with a solution for a very similar ask on this forum and tried to modify the code as follows:
VBA Code:
Sub Testing()
  Dim N As Long, X As Long, Cell As Range, Arr1 As Variant, Arr2 As Variant
  Columns("B").Clear
  For Each Cell In Range("A1", Cells(Rows.Count, "A"))
    TotalString = ""
    Arr1 = Split(Cell, "(")
    For X = 1 To UBound(Arr1)
      If UCase(Arr1(X)) Like "CB*" Then
        Arr2 = Split(Arr1(X), "(CB")
        N = N + 1
        TotalString = TotalString & "," & "(" & Arr2(0) ' & "KB"
        Cells(Cell.Row, "B") = TotalString
        End If
    Next
  Next
End Sub

However, when I run it, it extracts the (CB numbers correctly but also leaves in some other text that I don't want. It also doesn't filter duplicates.

Could anyone please offer me some advice? Thank you so much!

Mini sheet:
Scancode_Example.xlsx
AB
1Scancodes & TypeOutput I want
2* Margarine Wholesale (27), Margarine Retail (241), Margarine Retail (287), Margarine Bulk (1781) (CB28412) * Margarine Wholesale (281), Margarine Retail (42), Margarine Retail (13), Margarine Bulk (27) (CB28412) * Margarine Wholesale (19), Margarine Retail (281), Margarine Retail (411), Margarine Bulk (3814) (CB82474)(CB28412),(CB82474)
3* Margarine Wholesale (11), Margarine Retail (27), Margarine Retail (14), Margarine Bulk (15) (CB91254) * Margarine Wholesale (6), Margarine Retail (399), Margarine Retail (400), Margarine Bulk (51) (CB49185)(CB91254),(CB49185)
Data
 

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
Try this:
The data starts at A2.
Results in B2 down

VBA Code:
Sub extracting_codes_2()
  Dim dic As Object, c As Range
  Dim sText As Variant, i As Long
  
  Set dic = CreateObject("Scripting.Dictionary")
  For Each c In Range("A2", Range("A" & Rows.Count).End(3))
    Set dic = CreateObject("Scripting.Dictionary")
    sText = Split(c.Value, "(CB", , vbTextCompare)
    For i = 1 To UBound(sText)
      dic("(CB" & Split(sText(i), ")")(0) & ")") = Empty
    Next
    c.Offset(0, 1).Value = Join(dic.keys, ",")
  Next
End Sub
 
Upvote 0
I'm using Regular expression to retrieve the CB number and Dictionary to eliminate duplicate
VBA Code:
Sub Test()

Dim CB As Variant, MatchCB As Variant, key As Variant
Dim cell As Range, rng As Range
Dim ws As Worksheet
Dim DictCB As Object, RegEx As Object

Set DictCB = CreateObject("Scripting.Dictionary")
Set RegEx = CreateObject("VBscript.RegExp")
Set ws = ThisWorkbook.ActiveSheet

For Each cell In ws.Range("A2", ws.Cells(Rows.Count, "A").End(xlUp))
    With RegEx
        .Pattern = "\(CB\d+\)"
        .Global = True
        Set MatchCB = .Execute(cell.Value)
    End With
    For Each CB In MatchCB
        If Not DictCB.Exists(CB.Value) Then
            DictCB.Add CB.Value, Nothing
        End If
    Next
    For Each key In DictCB.keys
        ws.Range("B" & cell.Row) = ws.Range("B" & cell.Row) & ", " & key
    Next
    ws.Range("B" & cell.Row) = Right(ws.Range("B" & cell.Row), Len(ws.Range("B" & cell.Row)) - 1)
    DictCB.RemoveAll
Next

End Sub
 
Upvote 0
Try this:
The data starts at A2.
Results in B2 down

VBA Code:
Sub extracting_codes_2()
  Dim dic As Object, c As Range
  Dim sText As Variant, i As Long
 
  Set dic = CreateObject("Scripting.Dictionary")
  For Each c In Range("A2", Range("A" & Rows.Count).End(3))
    Set dic = CreateObject("Scripting.Dictionary")
    sText = Split(c.Value, "(CB", , vbTextCompare)
    For i = 1 To UBound(sText)
      dic("(CB" & Split(sText(i), ")")(0) & ")") = Empty
    Next
    c.Offset(0, 1).Value = Join(dic.keys, ",")
  Next
End Sub

Muchas gracias, this is a really elegant solution!! Thank you so much!!!

Is there any way this can be modified to where it can handle truncated data? For instance, I sometimes run across an entry like this:

* Margarine Wholesale (11), Margarine Retail (27), Margarine Retail (14), Margarine Bulk (15) (CB91254) * Margarine Wholesale (6), Margarine Retail (399), Margarine Retail (400), Margarine Bulk (51) (CB


...Where it just suddenly ends in (CB but doesn't have the encloses parenthesis. I get a "subscript out of range" error when it hits those. It'd be perfect if it could either ignore the truncated "(CB" or else change it to "(CB)", whichever would be easy.

Updated minisheet:

Closer.xlsm
AB
1Sample DataOutput
2Margarine Wholesale (27), Margarine Retail (241), Margarine Retail (287), Margarine Bulk (1781) (CB28412) * Margarine Wholesale (281), Margarine Retail (42), Margarine Retail (13), Margarine Bulk (27) (CB28412) * Margarine Wholesale (19), Margarine Retail (281), Margarine Retail (411), Margarine Bulk (3814) (CB82474)(CB28412),(CB82474)
3* Margarine Wholesale (11), Margarine Retail (27), Margarine Retail (14), Margarine Bulk (15) (CB91254) * Margarine Wholesale (6), Margarine Retail (399), Margarine Retail (400), Margarine Bulk (51) (CB49185)(CB91254),(CB49185)
4* Margarine Wholesale (11), Margarine Retail (27), Margarine Retail (14), Margarine Bulk (15) (CB91254) * Margarine Wholesale (6), Margarine Retail (399), Margarine Retail (400), Margarine Bulk (51) (CB(CB91254)
Sheet2


Other than that, this is absolutely perfect for me. I can't believe you did it in so few lines of code, thank you very much!!

SJ
 
Upvote 0
Cross-posting (posting the same question in more than one forum), read rule #13 of the Forum Rules.
If you have posted the question at more places, please provide links to those as well.


Where it just suddenly ends in (CB but doesn't have the encloses parenthesis. I get a "subscript out of range" error when it hits those. It'd be perfect if it could either ignore the truncated "(CB" or else change it to "(CB)", whichever would be easy.

Try this:
VBA Code:
Sub extracting_codes_2()
  Dim dic As Object, c As Range
  Dim sText As Variant, i As Long
  
  Set dic = CreateObject("Scripting.Dictionary")
  For Each c In Range("A2", Range("A" & Rows.Count).End(3))
    Set dic = CreateObject("Scripting.Dictionary")
    sText = Split(c.Value, "(CB", , vbTextCompare)
    For i = 1 To UBound(sText)
      If InStr(1, sText(i), ")") > 0 Then dic("(CB" & Split(sText(i), ")")(0) & ")") = Empty
    Next
    c.Offset(0, 1).Value = Join(dic.keys, ",")
  Next
End Sub
 
Upvote 0
Solution
This question has been cross-posted at the ExcelForum forum.

Thanks very much for the reply. My apologies, but this is not true. I did cross post another issue -- which is much more complex ---using a similar dataset to the ExcelForum, but definitely NOT this question. This is a second issue I'm working on simultaneously to the other project. However, I'll go update my other post on MrExcel with the link to that post.... but honestly, this is the only place I've ever posted this question about extracting all words that start with (CB and end with ).
 
Upvote 0
Cross-posting (posting the same question in more than one forum), read rule #13 of the Forum Rules.
If you have posted the question at more places, please provide links to those as well.




Try this:
VBA Code:
Sub extracting_codes_2()
  Dim dic As Object, c As Range
  Dim sText As Variant, i As Long
 
  Set dic = CreateObject("Scripting.Dictionary")
  For Each c In Range("A2", Range("A" & Rows.Count).End(3))
    Set dic = CreateObject("Scripting.Dictionary")
    sText = Split(c.Value, "(CB", , vbTextCompare)
    For i = 1 To UBound(sText)
      If InStr(1, sText(i), ")") > 0 Then dic("(CB" & Split(sText(i), ")")(0) & ")") = Empty
    Next
    c.Offset(0, 1).Value = Join(dic.keys, ",")
  Next
End Sub

This worked perfectly! I am so, so grateful for your assistance. This will make some of my tasks MUCH easier in the coming months and will save me an enormous amount of time. I cannot tell you how much this is going to help me. Thanks again for taking the time out of your day to help me, I sincerely appreciate. ¡Muchas gracias!

Best Wishes,

SJ
 
Upvote 0
I'm using Regular expression to retrieve the CB number and Dictionary to eliminate duplicate
VBA Code:
Sub Test()

Dim CB As Variant, MatchCB As Variant, key As Variant
Dim cell As Range, rng As Range
Dim ws As Worksheet
Dim DictCB As Object, RegEx As Object

Set DictCB = CreateObject("Scripting.Dictionary")
Set RegEx = CreateObject("VBscript.RegExp")
Set ws = ThisWorkbook.ActiveSheet

For Each cell In ws.Range("A2", ws.Cells(Rows.Count, "A").End(xlUp))
    With RegEx
        .Pattern = "\(CB\d+\)"
        .Global = True
        Set MatchCB = .Execute(cell.Value)
    End With
    For Each CB In MatchCB
        If Not DictCB.Exists(CB.Value) Then
            DictCB.Add CB.Value, Nothing
        End If
    Next
    For Each key In DictCB.keys
        ws.Range("B" & cell.Row) = ws.Range("B" & cell.Row) & ", " & key
    Next
    ws.Range("B" & cell.Row) = Right(ws.Range("B" & cell.Row), Len(ws.Range("B" & cell.Row)) - 1)
    DictCB.RemoveAll
Next

End Sub
Thank you for taking the time to provide this code. I got some inconsistent results when I tried it on large datasets but it did work very well on some of my smaller files. This is interesting code that I will study in more detail and is a contrast from the other. Thanks again for the help!
 
Upvote 0
This worked perfectly! I am so, so grateful for your assistance. This will make some of my tasks MUCH easier in the coming months and will save me an enormous amount of time. I cannot tell you how much this is going to help me. Thanks again for taking the time out of your day to help me, I sincerely appreciate. ¡Muchas gracias!

Best Wishes,

SJ

I'm glad to help you. Thanks for the feedback.

Best wishes too
 
Upvote 0
Thank you for taking the time to provide this code. I got some inconsistent results when I tried it on large datasets but it did work very well on some of my smaller files. This is interesting code that I will study in more detail and is a contrast from the other. Thanks again for the help!
Glad to hear that your problem is solved with @DanteAmor help
 
Upvote 0

Forum statistics

Threads
1,224,822
Messages
6,181,165
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top