Substring Search !

Eddny

New Member
Joined
Jun 26, 2018
Messages
26
Hello!

So let's say I have:

Dim MainString as string, Substring as string
Dim Result as Boolean

MainString ="ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"

Question: Is there an easier (faster) way to check whether MainString contains all elements of Substring without using loops?

Example 1: If Substring ="QRSTU|ABCDE|CGHMN" then Result = True since all elements of Substring are found in MainString.

Example 2: If Substring ="ABCDF|QRSTU|HPXYZ" then Result = False since not all elements of Substring are found in MainString.

Is there a way to perform these checks to see whether all the elements in Substring exist in MainString without using loops? I have thought about the Instr function, the Like operator, etc, However, it appears in each case I will have to use a loop.
Any help would be appreciated.

Eddny
 
Last edited:
If I have understood the requirement correctly there should still be no need for the dictionary or the loop. Try something like this.
Rich (BB code):
Sub CheckAllSubInAnyMain()
  Dim MainString As String, Substring As String
  Dim MainString1 As String, MainString2 As String, MainString3 As String, MainString4 As String
  Dim Result As Boolean
  Dim RX As Object
  
  MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
  MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
  MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
  MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"

  MainString = Join(Array(MainString1, MainString2, MainString3, MainString4), "|")
  Substring = "QRSTU|ABCDE|CGHMN"
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = MainString
  Result = Replace(RX.Replace(Substring, ""), "|", "") = vbNullString
End Sub

Peter - You are very close to understanding the requirement. Let me clarify a bit further.
I want to test whether all sections of Substring can be found in any one of Mainstring1, Mainstring2, Mainstring3, or Mainstring4.

Since the delimiter or separator for sections "within" each Substring is already "|", joining the Substrings themselves to form MainString would require a different delimiter or separator, for example, "/".

Therefore, this line in your code:
Code:
 MainString = Join(Array(MainString1, MainString2, MainString3, MainString4), [B]"|"[/B])
Should be changed to:
Code:
 MainString = Join(Array(MainString1, MainString2, MainString3, MainString4), "/")

So, I am thinking the solution may require one more level of a nested Replace method?
Please let me know, if you have any questions.
I appreciate your assistance.
 
Last edited:
Upvote 0

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.
I want to test whether all sections of Substring can be found in any one of Mainstring1, Mainstring2, Mainstring3, or Mainstring4.
So, are you saying that all sections of Substring must be found within, say, Mainstring1 or all found within, say, Mainstring3 for a "True" result?

So If 2 sections are found in Mainstring1 and 1 section is found in Mainstring4 then that should be a Fail/False result?

Some simple examples might help clarify.
 
Upvote 0
So, are you saying that all sections of Substring must be found within, say, Mainstring1 or all found within, say, Mainstring3 for a "True" result?

So If 2 sections are found in Mainstring1 and 1 section is found in Mainstring4 then that should be a Fail/False result?

Some simple examples might help clarify.

Peter - That is correct. If all sections of Substring are found in MainString1, or all sections of Substring are found in MainString2, or all sections of Substring are found in MainString3, or all sections of Substring are found in MainString4, all sections of Substring are found in MainStringX, then Result = True.

Otherwise, if all setions of Substring are not found in any Mainstring then Result = False. So if 2 sections of Substring are found in MainString1 and 1 section is found in MainString4, then since it is False for MainString1 and False for MainString4, Result = False.

In summary:
When there are full matches for all sections of Substring in MainString1, or there are full matches for all sections of Substring in MainString2, or there are full matches for all sections of Substring in MainString3, or there are full matches for all sections of Substring in MainString4, or there are full matches for all sections of Substring in MainStringX, then Result = True. Anything else, Result = False.

Example 1:
MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"

Substring = "QRSTU|ABCDE|CGHMN"

In Example 1, all sections of Substring can be found/matched in MainString1, so Result = True.


Example 2:
MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"
MainString5 = "CGHMN|ABCDE|ABHMN|QRSTU|HYRDE|FVDSE|PRCYZ"

Substring = "QRSTU|ABCDE|CGHMN"

In Example 2, all sections of Substring can be found/matched in MainString1 and MainString5, so Result = True.


Example 3:
MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"
MainString5 = "CGHMN|ABCDE|ABHMN|QRSTU|HYRDE|FVDSE|PRCYZ"

Substring = "QRSTU|ABCDE|CGHMN|FGHNO"

In Example 3, all sections of Substring cannot be found/matched in MainString1, or MainString2, or MainString3, or MainString4, or MainString5, so Result = False.

Please let me know if you have additional questions.
 
Last edited:
Upvote 0
See if this is any use.
Code:
Sub CheckAllSubInOneMain()
  Dim MainString As String, Substring As String, TestString As String
  Dim MainString1 As String, MainString2 As String, MainString3 As String, MainString4 As String, MainString5 As String
  Dim SubstringParts As Long
  Dim Result As Boolean
  Dim RX As Object
  
  MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
  MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
  MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
  MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"
  MainString5 = "CGHMN|ABCDE|ABHMN|QRSTU|HYRDE|FVDSE|PRCYZ"
  
  Substring = "QRSTU|ABCDE|CGHMN"
 
  MainString = "@" & Join(Array(MainString1, MainString2, MainString3, MainString4, MainString5), "@") & "@"
  SubstringParts = UBound(Split(Substring, "|")) + 1
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = Substring
  TestString = RX.Replace(MainString, "%")
  RX.Pattern = Replace("@([^@]*?%){#}[^@]*?@", "#", SubstringParts)
  Result = RX.Test(TestString)
End Sub
 
Upvote 0
See if this is any use.
Code:
Sub CheckAllSubInOneMain()
  Dim MainString As String, Substring As String, TestString As String
  Dim MainString1 As String, MainString2 As String, MainString3 As String, MainString4 As String, MainString5 As String
  Dim SubstringParts As Long
  Dim Result As Boolean
  Dim RX As Object
  
  MainString1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
  MainString2 = "ABCDX|OMPHY|EDCRF|QRSTU|UPLKM"
  MainString3 = "ABCDY|ABCDF|THYFE|QRSTU|VBNMK|DCSXE"
  MainString4 = "ABCDZ|ABCDF|CGHMN|QRSTU|HUJNB"
  MainString5 = "CGHMN|ABCDE|ABHMN|QRSTU|HYRDE|FVDSE|PRCYZ"
  
  Substring = "QRSTU|ABCDE|CGHMN"
 
  MainString = "@" & Join(Array(MainString1, MainString2, MainString3, MainString4, MainString5), "@") & "@"
  SubstringParts = UBound(Split(Substring, "|")) + 1
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = Substring
  TestString = RX.Replace(MainString, "%")
  RX.Pattern = Replace("@([^@]*?%){#}[^@]*?@", "#", SubstringParts)
  Result = RX.Test(TestString)
End Sub

Peter - The code is working nicely so far. In the line below, is there a limit on the number of Mainstrings I can use or the limit is however many Mainstrings the Join() and Array() functions allow to concatenate into Mainstring?
Code:
 MainString = "@" & Join(Array(MainString1, MainString2, MainString3, MainString4, MainString5), "@") & "@"
Thanks!
 
Last edited:
Upvote 0
I'm not sure. You will just have to test and see how you go I think.

Peter -

I tested the code for varying string lengths and it's been working flawlessly so far.

With regards to your other code below that takes in 2 substrings and produces one unique string where each section of each input substring appears only once in the output string, can it be modified to take in multiple substrings (instead of 2 substrings) and produce one unique string?

Code:
Sub Make_String_v2()
  Dim Substring1 As String, Substring2 As String, CombinedUniqueString As String
  Dim RX As Object
  
  Substring1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
  Substring2 = "ABCDE|OXYZ|QRSTU|KUBYV"
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = Substring1
  CombinedUniqueString = Replace(Application.Trim(Replace(Substring1, "|", " ") & " " & RX.Replace(Replace(Substring2, "|", " "), "")), " ", "|")
End Sub

Example 1: If ...
Substring1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
Substring2 = "ABCDE|OXYZ|QRSTU|KUBYV"
Substring3 = "LUHYZ|OXYZ|QRSTU|KUBYV"

Then CombinedUniqueString = ABCDE|ABCDF|CGHMN|QRSTU|CJKLM|OXYZ|KUBYV|LUHYZ


Example 2: If ...
Substring1 = "ABCDE|ABCDF|CGHMN|QRSTU|CJKLM"
Substring2 = "ABCDE|OXYZ|QRSTU|KUBYV|JKLMN|BDEFG"
Substring3 = "LUHYZ|OXYZ|QRSTU|KUBYV"
Substring4 = "DEFGZ|OXYZ|QRSTU|KUBYV"
Substring5 = "LUHYZ|OXYZ|QRSTU|CDGHIJ|MNOPU"

Then CombinedUniqueString = ABCDE|ABCDF|CGHMN|QRSTU|CJKLM|OXYZ|KUBYV|JKLMN|BDEFG|LUHYZ|DEFGZ|CDGHIJ|MNOPU


Thanks for all your assistance!
 
Last edited:
Upvote 0
Peter -

I tested the code for varying string lengths and it's been working flawlessly so far.
Good news so far then!


With regards to your other code below that takes in 2 substrings and produces one unique string where each section of each input substring appears only once in the output string, can it be modified to take in multiple substrings (instead of 2 substrings) and produce one unique string?
I would use a different approach. However, before suggesting specific code, I'm wondering how those various substrings will actually be getting into your code? Will they ..
- Come from a range of worksheet cells? If so can you give some details of the layout - column(s), row(s) etc
- Come from other vba code? If so, would those substrings be in an array or separate variables or something else? perhaps you could post the code.
- Come from user input via an InputBox or similar? If so, details please.
- Come from somewhere else? If so, details please.
 
Upvote 0
Good news so far then!


I would use a different approach. However, before suggesting specific code, I'm wondering how those various substrings will actually be getting into your code? Will they ..
- Come from a range of worksheet cells? If so can you give some details of the layout - column(s), row(s) etc
- Come from other vba code? If so, would those substrings be in an array or separate variables or something else? perhaps you could post the code.
- Come from user input via an InputBox or similar? If so, details please.
- Come from somewhere else? If so, details please.


Peter -

They are from a simple one-dimensional array.
The source of the various substrings is an array (that is, Mydict.Items), which also originally came from Range("L2:L1000000").

So the substrings were originally loaded into the Dictionary Items (Mydict.Items) using the code below:

Code:
Dim Mydict As Scripting.Dictionary
Dim a as range

Set Mydict = New Scripting.Dictionary

For each a in Range("L2:L1000000")
      If a.value="" then 
         Exit for
      Else
         Mydict.add a.row, Trim(a.value)
      End if
Next a

Please let me know if you need additional clarification.


Thanks!
 
Last edited:
Upvote 0
Sorry, more questions.

Your posted code is creating a dictionary where each entry consists of a row number as Key and a substring as Item.

1. Are you using this dictionary for something else other that to collect the substrings?
2. Are you using the row numbers in the dictionary for something?
3. Is it possible/likely that any of the substrings in column L are duplicates? I ask because the way you have constructed your dictionary means that any duplicate substrings in column L will also be duplicated as Items in your dictionary because they will have different Keys (row numbers).
4. Do you really have anything like 1,000,000 rows of data in column L? Can we just go from L2 down to wherever the last entry in column L is?
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,178
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top