Segregating Sentences based on Identifying First Word

madhuchelliah

Board Regular
Joined
Nov 22, 2017
Messages
226
Office Version
  1. 2019
Platform
  1. Windows
Hello all, i have multiple sentences in column B, range is dynamic which has to be segregated based on identifying it's 1st word and has to be copied to column C and stacked under each other cell. The 1st word of each sentence falls under any of the format as follows.
1. Third character of the word is Hyphen (-)
2. Third character of the word is Alphabet A (A)
3. Contains two or more Hyphens (AA-BCC-VVVVVV)
4. Contains forward slash (AA/BCC)
5. Contains Dot(s) (AA.BCC)
6. Starts with RR (RRXXXXX)
7. Starts with TBD
The macro has to identify the individual sentences based on the above formats and separate it to column C in diiferent cells. All the cells may not have the multiple sentences and all cells may not have sentences with all the formats mentioned above. It no multiple sentence found just copy the one sentence as it is to coumn C. Please see the example for better clarity. Thank you.

Book2.xlsm
BC
3
4
5Input in Column BExpected output in Column C
611-2525CD XXXX, AAAA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE AB-CD-EFR2 V158VV, RR65RR, TTTTT358T ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE RR25-5487 V158VV, RR65RR, TTTTT358T TBD A2552DDD, CCCC, C5C25D3D11-2525CD XXXX, AAAA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
732A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
8AB-CD-EFR2 V158VV, RR65RR, TTTTT358T
9ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
10AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE
11RR25-5487 V158VV, RR65RR, TTTTT358T
12TBD A2552DDD, CCCC, C5C25D3D
13
14
15
16
1711-2525CD XXXX, AAAA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE11-2525CD XXXX, AAAA, BBBBBB, CCCC
1832A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
19
20
21
22
23
24
25ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
26
27
28
Sheet1 (3)
 

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
Your thread has had quite a few people look at it but no answers. I suspect that other readers might be like me and not understand well enough what you want.
I understand the 7 numbered points in your post.
I don't understand ..
- How you determine what a "sentence" is and where a sentence starts or stops.
- Why the result you have shown in cell C7 is also included as part of the result in C6

Is your data in column B really spread out with all those blank cells?
 
Upvote 0
I don't understand ..
- How you determine what a "sentence" is and where a sentence starts or stops.
- Why the result you have shown in cell C7 is also included as part of the result in C6

Is your data in column B really spread out with all those blank cells?
Peter, Any word that matches with the 7 mentioned formats is the starting (1st Word) of a sentence. So based on next sentence's starting point, the end of the previous sentence has to be determined.

Result shown in C7 should not be in C6, it is a mistake i fixed it now.

Data in column B is spread out through the column B but it will give space to add the result in column C. No need to insert new row to display the output. The result wont exceed the available space at any instance. Later the empty rows will be removed. Hope it clears your doubts. Let me know if any additional input required. Thank you.

Book2.xlsm
BC
5Input in Column BExpected output in Column C
611-2525CD XXXX, AAAA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE AB-CD-EFR2 V158VV, RR65RR, TTTTT358T ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE RR25-5487 V158VV, RR65RR, TTTTT358T TBD A2552DDD, CCCC, C5C25D3D11-2525CD XXXX, AAAA, BBBBBB, CCCC
732A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
8AB-CD-EFR2 V158VV, RR65RR, TTTTT358T
9ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
10AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE
11RR25-5487 V158VV, RR65RR, TTTTT358T
12TBD A2552DDD, CCCC, C5C25D3D
13
14
15
16
1711-2525CD XXXX, AAAA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE11-2525CD XXXX, AAAA, BBBBBB, CCCC
1832A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
19
20
21
22
23
24
25ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
Sheet1 (3)
 
Upvote 0
Any word that matches with the 7 mentioned formats is the starting (1st Word) of a sentence.
In that case
  • why does C6 contain "11-2525CD XXXX, AAAA, BBBBBB, CCCC" given that the red word fits rule2?
    Shouldn't that be split into "11-2525CD XXXX," and "AAAA, BBBBBB, CCCC"?
  • why does C8 contain "AB-CD-EFR2 V158VV, RR65RR, TTTTT358T" given that the red word fits rule 6?
    Shouldn't that be split into "AB-CD-EFR2 V158VV," and "RR65RR, TTTTT358T"?
  • why does C11 contain "RR25-5487 V158VV, RR65RR, TTTTT358T" given that the red word fits rule 6?
    Shouldn't that be split into "RR25-5487 V158VV," and "RR65RR, TTTTT358T"?
 
Upvote 0
In that case
  • why does C6 contain "11-2525CD XXXX, AAAA, BBBBBB, CCCC" given that the red word fits rule2?
    Shouldn't that be split into "11-2525CD XXXX," and "AAAA, BBBBBB, CCCC"?
  • why does C8 contain "AB-CD-EFR2 V158VV, RR65RR, TTTTT358T" given that the red word fits rule 6?
    Shouldn't that be split into "AB-CD-EFR2 V158VV," and "RR65RR, TTTTT358T"?
  • why does C11 contain "RR25-5487 V158VV, RR65RR, TTTTT358T" given that the red word fits rule 6?
    Shouldn't that be split into "RR25-5487 V158VV," and "RR65RR, TTTTT358T"?
Thanks for pointing out peter. Those are dummy data, but didn't realized that will impact those 7 points. Corrected now. Sorry for the trouble.

Book2.xlsm
BC
5Input in Column BExpected output in Column C
611-2525CD XXXX, AABA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE AB-CD-EFR2 V158VV, KK65RR, TTTTT358T ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE RR25-5487 V158VV, KK65RR, TTTTT358T TBD A2552DDD, CCCC, C5C25D3D11-2525CD XXXX, AABA, BBBBBB, CCCC
732A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
8AB-CD-EFR2 V158VV, KK65RR, TTTTT358T
9ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
10AA.BB.CCC CCCC, CCCC3444D, DDDD. EEEEEEEE
11RR25-5487 V158VV, KK65RR, TTTTT358T
12TBD A2552DDD, CCCC, C5C25D3D
13
14
15
16
1711-2525CD XXXX, AABA, BBBBBB, CCCC 32A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE11-2525CD XXXX, AABA, BBBBBB, CCCC
1832A2315 CCCC, CCCCDDDD, DDDD. EEEEEEEE
19
20
21
22
23
24
25ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU ABC/DEF/25-256 GGG, HHHHH, TTTTT, UUUUUUU
Sheet1 (3)
 
Upvote 0
I have assumed your data is in Column B. I have also assumed that each data value has enough blank rows after it so that its sentences will fit in Column C without overlapping the sentences from the next value's sentences below it (if this is not the case, let me know and I'll adjust the code to move the data around so everything fits in Column C without overlaps). With that said, give this macro a try...
VBA Code:
Sub Sentences()
  Dim X As Long, Cell As Range, Arr As Variant
  For Each Cell In Columns("B").SpecialCells(xlConstants)
    Arr = Split(Application.Trim(Replace(Cell.Value, vbLf, " ")))
    For X = 1 To UBound(Arr)
      If Arr(X) Like "*-*" Or _
         Arr(X) Like "??A*" Or _
         Arr(X) Like "*-*-*" Or _
         Arr(X) Like "*/*" Or _
         Arr(X) Like "*?.?*" Or _
         Arr(X) Like "RR*" Or _
         Arr(X) Like "TBD*" Then
        Arr(X) = Chr$(1) & Arr(X)
      End If
    Next
    Arr = Split(Join(Arr), " " & Chr$(1))
    Cell.Offset(, 1).Resize(1 + UBound(Arr)) = Application.Transpose(Arr)
  Next
End Sub
 
Upvote 0
Solution
I have assumed your data is in Column B. I have also assumed that each data value has enough blank rows after it so that its sentences will fit in Column C without overlapping the sentences from the next value's sentences below it (if this is not the case, let me know and I'll adjust the code to move the data around so everything fits in Column C without overlaps). With that said, give this macro a try...
VBA Code:
Sub Sentences()
  Dim X As Long, Cell As Range, Arr As Variant
  For Each Cell In Columns("B").SpecialCells(xlConstants)
    Arr = Split(Application.Trim(Replace(Cell.Value, vbLf, " ")))
    For X = 1 To UBound(Arr)
      If Arr(X) Like "*-*" Or _
         Arr(X) Like "??A*" Or _
         Arr(X) Like "*-*-*" Or _
         Arr(X) Like "*/*" Or _
         Arr(X) Like "*?.?*" Or _
         Arr(X) Like "RR*" Or _
         Arr(X) Like "TBD*" Then
        Arr(X) = Chr$(1) & Arr(X)
      End If
    Next
    Arr = Split(Join(Arr), " " & Chr$(1))
    Cell.Offset(, 1).Resize(1 + UBound(Arr)) = Application.Transpose(Arr)
  Next
End Sub
Brilliant piece of macro. Working as expected and a new learning for me. Thanks for your time.
 
Upvote 0
Another approach that I think also does what you want.
I have assumed that all the non-blank cells from B6 down contain at least one word that matches one of the 7 rules.
If that is not the case, what would you want in column C for that row?

VBA Code:
Sub ExtractSections()
  Dim RX As Object
  Dim c As Range
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "(^| )((\S\S\-)|(\S\SA)|([^\- ]+\-[^\- ]+\-)|(\S+\/)|(\S+\.\S)|(RR)|(TBD))"
  For Each c In Range("B6", Range("B" & Rows.Count).End(xlUp)).SpecialCells(xlConstants)
    c.Offset(, 1).Resize(RX.Execute(Replace(c.Value, vbLf, " ")).Count).Value = Application.Transpose(Split(Mid(RX.Replace(Replace(c.Value, vbLf, " "), "|$2"), 2), "|"))
  Next c
End Sub
 
Upvote 0

Forum statistics

Threads
1,223,909
Messages
6,175,312
Members
452,634
Latest member
cpostell

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top