Extract a string of text from a cell

RattlingCarp3048

Board Regular
Joined
Jan 12, 2022
Messages
202
Office Version
  1. 365
Platform
  1. Windows
this one is tripping me up big time. via formula or VBA. Column P contains numerous rows ranging from 1-1500 in any given month. Each cell in column P contains a paragraph of information. Somewhere within each paragraph is a brief statement "x of x cores". i need to extract that statement from each cell. unfortunately "x of x cores" can be any combination of "zero of one core" or "two of four cores" or "four of four cores" or "zero of three cores" etc all the way up to four of four and the numbers are always spelled out.

any ideas how to make this happen?

Column P random paragraphextracted result
;jlasijfaoij asolifdj ; ashdfa hfosdf ;oa ufoa two of two cores aosihfo ahsodfh sofah ashdf otwo of two cores
klaslfjas sldkjf asjd asljfoij fofheoiwehf J;OASDHF OW j;asld % aofh # of % one of three oiaskdn aoshf a oiasofione of three cores
oiafj;as oasfan aofa oai foajf oaiaf two of four cores aosifal oais f oai f oaif oj;oiashodfu %two of four
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
How about this?

EXCEL
AB
1Column P random paragraphextracted result
2;jlasijfaoij asolifdj ; ashdfa hfosdf ;oa ufoa two of two cores aosihfo ahsodfh sofah ashdf otwo of two cores
3klaslfjas sldkjf asjd asljfoij fofheoiwehf J;OASDHF OW j;asld % aofh # of % one of three cores oiaskdn aoshf a oiasofione of three cores
4oiafj;as oasfan aofa oai foajf oaiaf two of four cores aosifal oais f oai f oaif oj;oiashodfu %two of four cores
Data
Cell Formulas
RangeFormula
B2:B4B2=cores(A2)


VBA Code:
Function cores(s As String)
With CreateObject("VBScript.RegExp")
    .Pattern = "\w+\sof\s\w+\scores"
    cores = .Execute(s)(0)
End With
End Function
 
Upvote 0
Book1
AB
1;jlasijfaoij asolifdj ; ashdfa hfosdf ;oa ufoa two of two cores aosihfo ahsodfh sofah ashdf otwo of two cores
2klaslfjas sldkjf asjd asljfoij fofheoiwehf J;OASDHF OW j;asld % aofh # of % one of three cores oiaskdn aoshf a oiasofione of three cores
3oiafj;as oasfan aofa oai foajf oaiaf two of four cores aosifal oais f oai f oaif oj;oiashodfu %two of four cores
Sheet4
Cell Formulas
RangeFormula
B1:B3B1=TEXTJOIN(" ",,INDEX(TEXTSPLIT(A1," "),SEQUENCE(4,1,MATCH("cores",TEXTSPLIT(A1," "),0)-3)))
 
Upvote 0
Your second example is missing the word cores, assuming it's missing:

Book2
AB
1;jlasijfaoij asolifdj ; ashdfa hfosdf ;oa ufoa two of two cores aosihfo ahsodfh sofah ashdf otwo of two cores
2klaslfjas sldkjf asjd asljfoij fofheoiwehf J;OASDHF OW j;asld % aofh # of % one of three cores oiaskdn aoshf a oiasofione of three cores
3oiafj;as oasfan aofa oai foajf oaiaf two of four cores aosifal oais f oai f oaif oj;oiashodfu %two of four cores
Sheet1
Cell Formulas
RangeFormula
B1:B3B1=TEXTAFTER(TEXTBEFORE(A1,"cores")," ",-4) & "cores"
 
Upvote 0
Is it possible there is "core" (or any words that has "core" in it, like "score"), that do not follow the pattern "x of x core?. For example:
This is the core of his message.
The score gap is too high.

And to everyone, I think it should be core instead of cores because there is "zero of one core".
 
Upvote 0
Is it possible there is "core" (or any words that has "core" in it, like "score"), that do not follow the pattern "x of x core?. For example:
This is the core of his message.
The score gap is too high.

And to everyone, I think it should be core instead of cores because there is "zero of one core".
Yes, the word "core" or "cores" may be used else where outside of this pattern. I will test all of these tomorrow when I get back to work.
 
Upvote 0
Yes, the word "core" or "cores" may be used else where outside of this pattern. I will test all of these tomorrow when I get back to work.

I like the RegEx approach that @lrobbo314 gave you above. If you have core & cores present, you need to make a slight change to his script (line 3):

from this:
VBA Code:
.Pattern = "\w+\sof\s\w+\score"

to this:
VBA Code:
.Pattern = "\w+\sof\s\w+\score[s]"
 
Upvote 0
Yes, the word "core" or "cores" may be used else where outside of this pattern.

Another option:
1. I assume in the pattern "x of x core" the first x won't be higher than the second x, so this code will reject something like "three of two cores" because three is higher than two.
2. I put the strings "zero one two three four" in the code, so it will reject something like: "something of this core"

VBA Code:
Sub RattlingCarp3048_1()
Dim i As Long, j As Long, k As Long
Dim tx As String
Dim va, vx, ary, vb
    ary = Split("zero one two three four")
    ReDim vx(1 To 15, 1 To 1)
    For i = 0 To UBound(ary)
        For j = i To UBound(ary)
            k = k + 1
            vx(k, 1) = ary(i) & " of " & ary(j) & " core"
        Next
    Next
    va = Range("A1", Cells(Rows.Count, "A").End(xlUp))
    ReDim vb(1 To UBound(va, 1), 1 To 1)
    
    For i = 1 To UBound(va, 1)
        tx = LCase(va(i, 1))
        For Each x In vx
            If InStr(tx, x) > 0 Then
                vb(i, 1) = x
            End If
        Next
    Next
Range("B1").Resize(UBound(vb, 1), 1) = vb
End Sub
Example:
Book1
AB
1data
2One of one core random paragraphone of one core
3Y got what two of three cores he bid fortwo of three core
4two of four cores somethingtwo of four core
5random paragraph zero of zero corezero of zero core
6three of one core
7something of this core
8This is the cores of his message.
9The score gap is too high.
Sheet2
 
Upvote 0
ok... i've tried everyone's suggestions and this is everything i found. it looks like our "standard" format isnt so standard after all and we have some clean-up on our end to do. until then, is there a way to account for as many of these variations as possible? the # of extra spaces seems to vary between 1 and 6 as far as i can tell. im not sure if the extra spaces are true typo's or if they are just how excel exports is exporting the data.
X of x core
X of x core(s)
X of x core(s)
X of x core(s)
X out of x core(s)
X out of x core(s)
X out of x core(s)
Some have this statement more than once.

@Akuini worked the best. if found all the cases with the correct standard format. Is there a way to catch as many of the variations as possible?

@lrobbo314 which gave me this error...
1715790225210.png

so i tried it again with @Burrgogi suggested edit which gave me this...
1715790308913.png

so i added the end sub which gave me this.. (i also tried replacing the End Function with End Sub but same result)
1715790349943.png


@Cubist formula works as long as there are none of the variations i mentioned above.

1715791367116.png


@Scott Huish formula didnt return anything. I tried different variations of "core", "cores", and "cores,"... error every time.
1715791818119.png
 
Upvote 0
ok... i've tried everyone's suggestions and this is everything i found. it looks like our "standard" format isnt so standard after all and we have some clean-up on our end to do. until then, is there a way to account for as many of these variations as possible? the # of extra spaces seems to vary between 1 and 6 as far as i can tell. im not sure if the extra spaces are true typo's or if they are just how excel exports is exporting the data.
X of x core
X of x core(s)
X of x core(s)
X of x core(s)
X out of x core(s)
X out of x core(s)
X out of x core(s)
Some have this statement more than once.

@Akuini worked the best. if found all the cases with the correct standard format. Is there a way to catch as many of the variations as possible?

@lrobbo314 which gave me this error...
View attachment 111436
so i tried it again with @Burrgogi suggested edit which gave me this...
View attachment 111437
so i added the end sub which gave me this.. (i also tried replacing the End Function with End Sub but same result)
View attachment 111438

@Cubist formula works as long as there are none of the variations i mentioned above.

View attachment 111440

@Scott Huish formula didnt return anything. I tried different variations of "core", "cores", and "cores,"... error every time.
View attachment 111446
disregard the part about all the variations.... i have a solution to make those spacing corrections on our end.

if there is a way to account for the statement being more than once i think that would give me the best outcome.
 
Upvote 0

Forum statistics

Threads
1,223,885
Messages
6,175,184
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top