Hey guys,
I have the running code that is able to identify patterns in a string and return corresponding phrases according to what I've set. The code is as follow:
The specific part of the code that I want to clarify are highlighted in red. As the dataset that I've been working on is pretty inconsistent, there are certain strings that my code could not identify. Hence, I've set out to also include a substitute function into the code so was to more accurately pick out the trends.
The issue is that because certain phrases may be worded differently. For example, "LAYERPACK" could be worded either as "LAYERPACKED", "LAYER PACK", or "LAYERPACK", is there a similar way that I could do to identify like how I could for patterns with REGEX?
Lastly, it takes a pretty substantial amount of time to run this code as the database is pretty huge (over 10,000 rows). Hence, if you could also help optimize the code, that'll be great!
Thanks in advance!!
I have the running code that is able to identify patterns in a string and return corresponding phrases according to what I've set. The code is as follow:
Code:
Sub Split_product()
'The following code will extract specific phrases from a single cell if it satisfies the conditions set
Dim r, s, re
Dim reWeight, reWeight2, reGrade, reGrade2, rePack, mc
Dim LastRow As Long
'Clearing all the data in the Weight, Grade and Packaging column
LastRow = Range("A" & Rows.Count).End(xlUp).Row
Range("B2:D" & LastRow).Clear
'Setting a function which can execute Regular Expression (RegExp)
Set reWeight = CreateObject("VBScript.RegExp")
Set reWeight2 = CreateObject("VBScript.RegExp")
Set reGrade = CreateObject("VBScript.RegExp")
Set reGrade2 = CreateObject("VBScript.RegExp")
Set rePack = CreateObject("VBScript.RegExp")
'Any pattern involving digits (? = optional) followed by non-digit character (like ".""-"), and digit followed by (k=optional)g (space?)(up?)
reWeight.ignorecase = True: reWeight.Pattern = "\d+?(-\d+)?[^0-9]?\d+?(-\d+)?[k]?g\s?[u]?[p]?"
'If the phrase have Unsized or Line_Run it'll be recognized. Ignorecase = Recognizes both lower and capital letters
reWeight2.ignorecase = True: reWeight2.Pattern = "Unsized|Line_Run"
'Recognize Grade, but only return the one in the () and returns any letter from A to Z + any additional letter (? - maybe)
reGrade.ignorecase = True: reGrade.Pattern = "Grade\s+[A-Z][A-Z]?"
'If the phrase have "No Grade", return it
reGrade2.ignorecase = True: reGrade2.Pattern = "No Grade"
[COLOR=#ff0000] 'If the pattern have "following packaging phrases", it'll recognize and return[/COLOR]
[COLOR=#ff0000] rePack.ignorecase = True: rePack.Pattern = "Block|LP|IQF|Tray|IWP|VAC|Cartridge|Bag"[/COLOR]
'Setting the range of where the phrase you want to pick apart is located at. The code will go through every line
For r = 2 To Cells(Rows.Count, 1).End(xlUp).Row
s = Cells(r, 1).Value
[COLOR=#ff0000] 'Cleaning the dataset in order to identify the correct patterns in the strings[/COLOR]
[COLOR=#ff0000] s = Replace(s, "V/P", "VAC")[/COLOR]
[COLOR=#ff0000] s = Replace(s, "INOVACAO", "INNOVATION")[/COLOR]
[COLOR=#ff0000] s = Replace(s, "LAYERPACKED|LAYERPACK", "LP")[/COLOR]
[COLOR=#ff0000] s = Replace(s, "L/P", "LP")[/COLOR]
[COLOR=#ff0000] s = Replace(s, "INDIVIDUALLY WRAPPED PACK", "IWP")[/COLOR]
[COLOR=#ff0000] s = Replace(s, " PACK ", " BAG ")[/COLOR]
[COLOR=#ff0000] s = Replace(s, ",", ".")[/COLOR]
'Identifying the pattern and returning the corresponding values
Set mc = reWeight.Execute(s)
If mc.Count > 0 Then Cells(r, "B") = mc(0)
Set mc = reWeight2.Execute(s)
If mc.Count > 0 Then Cells(r, "B") = mc(0)
Set mc = reGrade.Execute(s)
If mc.Count > 0 Then Cells(r, "C") = mc(0)
Set mc = reGrade2.Execute(s)
If mc.Count > 0 Then Cells(r, "C") = mc(0)
Set mc = rePack.Execute(s)
If mc.Count > 0 Then Cells(r, "D") = mc(0)
Next
End Sub
The specific part of the code that I want to clarify are highlighted in red. As the dataset that I've been working on is pretty inconsistent, there are certain strings that my code could not identify. Hence, I've set out to also include a substitute function into the code so was to more accurately pick out the trends.
The issue is that because certain phrases may be worded differently. For example, "LAYERPACK" could be worded either as "LAYERPACKED", "LAYER PACK", or "LAYERPACK", is there a similar way that I could do to identify like how I could for patterns with REGEX?
Lastly, it takes a pretty substantial amount of time to run this code as the database is pretty huge (over 10,000 rows). Hence, if you could also help optimize the code, that'll be great!
Thanks in advance!!