Split text with <div> <b> and/or <p>

twelin

New Member
Joined
May 10, 2019
Messages
15
Hi,
I have hundreds of texts in Excel that i need to split by their "html splits". Problem is that all methods are used and sometimes there are multiple breaks.

Example 1:
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term. <div><br><div>The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.<br></div><div><br></div><div>These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.</div></div>

Example 2:
<p>The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.</p><div>The fund invests in equities that are primarily issued by companies in Asia, New Zealand and Australia or in companies that have a minimum of 50% of their business activity or market in Asia, and thereby have a currency exposure in these markets.<br><div><br></div><div>The thematic investment strategy implies that investments occur primarily in companies associated with long-term investment themes that have been carefully selected, analysed and are expected to experience a long period of growth.<br></div>

Could anyone come up with a VBA to solve this? My goal is to split by alla "breaks" (div, br and div)
 

Excel Facts

Ambidextrous Undo
Undo last command with Ctrl+Z or Alt+Backspace. If you use the Undo icon in the QAT, open the drop-down arrow to undo up to 100 steps.
in the same cell or to different rows like here?
raw
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term. <div><br><div>The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.<br></div><div><br></div><div>These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.</div></div>
<p>The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.</p><div>The fund invests in equities that are primarily issued by companies in Asia, New Zealand and Australia or in companies that have a minimum of 50% of their business activity or market in Asia, and thereby have a currency exposure in these markets.<br><div><br></div><div>The thematic investment strategy implies that investments occur primarily in companies associated with long-term investment themes that have been carefully selected, analysed and are expected to experience a long period of growth.<br></div>
raw
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.
The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.
These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.
The fund invests in equities that are primarily issued by companies in Asia, New Zealand and Australia or in companies that have a minimum of 50% of their business activity or market in Asia, and thereby have a currency exposure in these markets.
The thematic investment strategy implies that investments occur primarily in companies associated with long-term investment themes that have been carefully selected, analysed and are expected to experience a long period of growth.
 
Upvote 0
Data in A1, try this at B1:
=TEXTJOIN("",TRUE,MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))))+1,1),1000))

Limitations: No. of sentences in a cell < 20, No of characters in a sentence < 1000.
 
Upvote 0
Data in A1, try this at B1:
=TEXTJOIN("",TRUE,MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))))+1,1),1000))

Limitations: No. of sentences in a cell < 20, No of characters in a sentence < 1000.
Edited:
Limitations: No. of "html splits" in a cell < 20, No of characters between "html splits" < 1000.
 
Upvote 0
Spliting into rows, at C1;

=INDEX(MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))))+1,1),1000),AGGREGATE(15,6,IF(MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(ROW($A$1:$A$20)-1)*LEN(A1),LEN(A1))))+1,1),1000)<>"",ROW($A$1:$A$20),""),ROW($A$1:$A$20)),)
 
Upvote 0
Can you confirm exactly what results you want (& where) for the two samples you gave?
Hi, sorry for not being clear. What I really would like is to get all sentences between the breaks in columns to the right, like this:
(One problem is that not all cells starts or ends with a break (<p> or <div>))
Original
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term. <div><br><div>The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.<br></div><div><br></div><div>These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.</div></div>The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.
 
Upvote 0
Edited:
Limitations: No. of "html splits" in a cell < 20, No of characters between "html splits" < 1000.
Thank you KP117, but none of these solutions seems to do the job for me. The original text is left in A1, i paste your formula into b1 or c1, but nothing happens?
 
Upvote 0
Hi, sorry for not being clear. What I really would like is to get all sentences between the breaks in columns to the right, like this:
(One problem is that not all cells starts or ends with a break (<p> or <div>))
Original
The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term. <div><br><div>The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.<br></div><div><br></div><div>These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.</div></div>The fund is actively managed, whereby the fund manager takes active, independent investment decisions. The objective is to exceed the returns in the underlying market (defined as the benchmark index) over the long term.The fund invests in interest-bearing securities denominated in euro. The fund's assets shall be invested in interest-bearing securities issued or guaranteed by a state, municipality or by a government or municipal authority or by some intergovernmental agency in which one or several states are members, as well as in covered bonds. The average residual fixed interest duration is between two and ten years.These shall have a credit rating of no less than BBB- (investment grade) according to Standard & Poor's or Baa3 according to Moody's rating scale. The fund may also invest in securities that lack an official credit rating if we consider it as having an equivalent credit quality.
That hardly counts for ..
Can you confirm exactly what results you want (& where) for the two samples you gave?
However, see if this works for you. Assumes data in A2:Ax with results in columns B, C, D. ...

VBA Code:
Sub Split_Text()
  Dim RX As Object
  Dim a As Variant
  Dim i As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "(<.*?>)+"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    a(i, 1) = RX.Replace("<>" & a(i, 1), "|")
  Next i
  With Range("B2").Resize(UBound(a))
    .Value = a
    .TextToColumns DataType:=xlDelimited, Tab:=False, Semicolon:=False, Comma:=False, Space:=False, Other:=True, OtherChar:="|", FieldInfo:=Array(1, 9)
  End With
End Sub
 
Upvote 0
Solution
Thank you KP117, but none of these solutions seems to do the job for me. The original text is left in A1, i paste your formula into b1 or c1, but nothing happens?
Try this at B1

=INDEX(MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(COLUMN($A$1:$T$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(COLUMN($A$1:$T$20)-1)*LEN(A1),LEN(A1))))+1,1),1000),,IFERROR(SMALL(IF(MID(TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(COLUMN($A$1:$T$20)-1)*LEN(A1),LEN(A1))),IFERROR(SEARCH(">",TRIM(MID(SUBSTITUTE(A1,"<",REPT(" ",LEN(A1))),1+(COLUMN($A$1:$T$20)-1)*LEN(A1),LEN(A1))))+1,1),1000)<>"",COLUMN($A$1:$T$20)),COLUMN($A$1:$T$20)),""))
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,323
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top