I have a column of Text data that has more than 300 words per cell. I am using MS Office 2021. I like to extract the words that start with capitalization from the cells. What is the formula for that?. Or is there any VBA script?
Cell Formulas | ||
---|---|---|
Range | Formula | |
B2:B7 | B2 | =IF(EXACT(LEFT(A2,1),UPPER(LEFT(A2,1))),A2,"") |
E2:E7 | E2 | =EXACT(LEFT(A2,1),UPPER(LEFT(A2,1))) |
I have a column of Text data that has more than 300 words per cell. I am using MS Office 2021. I like to extract the words that start with capitalization from the cells. What is the formula for that?. Or is there any VBA script?
Function StrExtract(Str As String) As String
'Updateby Extendoffice
Application.Volatile
Dim xStrList As Variant
Dim xRet As String
Dim I As Long
If Len(Str) = 0 Then Exit Function
xStrList = Split(Str, " ")
If UBound(xStrList) >= 0 Then
For I = 0 To UBound(xStrList)
If xStrList(I) = StrConv(xStrList(I), vbProperCase) Then
xRet = xRet & xStrList(I) & " "
End If
Next
StrExtract = Left(xRet, Len(xRet) - 1)
End If
End Function
Sub test()
Dim a
Dim I&, II&
a = Cells(1).CurrentRegion.Resize(, 1)
ReDim b(1 To UBound(a))
With CreateObject("VBScript.Regexp")
.Pattern = "([A-Z])\w+\W"
.Global = True
For I = 1 To UBound(a)
Set x = .Execute(a(I, 1))
For II = 0 To x.Count - 1
b(I) = IIf(b(I) = "", x(II), b(I) & "," & x(II))
Next
Next
Cells(1, 2).Resize(UBound(b)) = Application.Transpose(b)
End With
End Sub
You could always use some edited cells that contained, say, 20-30 words and ensure they included any of the unusual situations.Using XLBB to paste the mini cell makes the cells go way long, Since each cell contains more than 300 words. So Am posting a screenshot. If a long cell is ok to be pasted here, I will paste the mini cell.
Cool idea, I didn't think of that, Ok, Now doing the same.You could always use some edited cells that contained, say, 20-30 words and ensure they included any of the unusual situations.
Give this a try. I have assumed data in column A starting in row 2 and results in column B.I want the extracted words seperated by comma in the adjacent cells.
... If there are consecutive words that capital letters, I want them treated as single data, and be put as together.
Sub GetCapitalised_1()
Dim RX As Object
Dim a As Variant
Dim i As Long
Set RX = CreateObject("VBScript.RegExp")
RX.Global = True
RX.Pattern = "( [^A-Z][^ ]*)+"
With Range("A2", Range("A" & Rows.Count).End(xlUp))
a = .Value
For i = 1 To UBound(a)
a(i, 1) = Replace(RX.Replace(a(i, 1), " "), " ", ", ")
Next i
.Offset(, 1).Value = a
End With
End Sub
brvnbld.xlsm | ||||
---|---|---|---|---|
A | B | |||
1 | Data | Results | ||
2 | Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. | Lorem Ipsum, Lorem Ipsum, It, It, Letraset, Lorem Ipsum, Aldus PageMaker, Lorem Ipsum. | ||
3 | ||||
4 | One capitalised words here | One | ||
5 | There are many variations of passages of Lorem Ipsum available. | There, Lorem Ipsum | ||
Sheet1 |
But the Consecutive words that start with capital lettters must be taken as a single unit.An Option
VBA Code:Sub test() Dim a Dim I&, II& a = Cells(1).CurrentRegion.Resize(, 1) ReDim b(1 To UBound(a)) With CreateObject("VBScript.Regexp") .Pattern = "([A-Z])\w+\W" .Global = True For I = 1 To UBound(a) Set x = .Execute(a(I, 1)) For II = 0 To x.Count - 1 b(I) = IIf(b(I) = "", x(II), b(I) & "," & x(II)) Next Next Cells(1, 2).Resize(UBound(b)) = Application.Transpose(b) End With End Sub
Thanks you, The brackets do not matter,. It can avoid them,. The consecutive words if extracted then fine.Give this a try. I have assumed data in column A starting in row 2 and results in column B.
VBA Code:Sub GetCapitalised_1() Dim RX As Object Dim a As Variant Dim i As Long Set RX = CreateObject("VBScript.RegExp") RX.Global = True RX.Pattern = "( [^A-Z][^ ]*)+" With Range("A2", Range("A" & Rows.Count).End(xlUp)) a = .Value For i = 1 To UBound(a) a(i, 1) = Replace(RX.Replace(a(i, 1), " "), " ", ", ") Next i .Offset(, 1).Value = a End With End Sub
Here is my sample data and results
brvnbld.xlsm
A B 1 Data Results 2 Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Lorem Ipsum, Lorem Ipsum, It, It, Letraset, Lorem Ipsum, Aldus PageMaker, Lorem Ipsum. 3 4 One capitalised words here One 5 There are many variations of passages of Lorem Ipsum available. There, Lorem Ipsum Sheet1
Note that my code would not handle an example like this one of yours:
View attachment 68172