Extract PDF File name from HTML long text

AndrewP70

New Member
Joined
Feb 17, 2016
Messages
26
Hello All, Could I please have some help with the following. I have a file with 100,00 lines of html coding as per below. I am wanting to extract the PDF file name into a separate column if possible. All pdf files are preceded by the path Images/Datasheets/ if this helps.

<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/22215E.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Export as PDF</a>

Thanks in advance.
 
Yes, I realised that & removed my post. :)

Did you mean in the same cell?

You could try this user-defined function. To implement ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below (can use the icon at the top right of the code pane below) into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below and copy down.
6. Your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

VBA Code:
Function PDFName(s As String) As String
  Dim RX As Object, M As Object

  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Ignorecase = True
  RX.Pattern = "(Images/Datasheets/)([^<]+?\.pdf)"
  For Each M In RX.Execute(s)
    PDFName = PDFName & vbLf & M.SubMatches(1)
  Next M
  PDFName = Mid(PDFName, 2)
End Function

AndrewP70 2020-04-17 1.xlsm
AB
1<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/22215E.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Export as PDF</a>22215E.pdf
2<img src="Images/skf_snip.jpg">&nbsp;<b>WSTUC 012-CPSS-DFH</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 3/4" Shaft Diameter<br>• Composite Take-Up Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br>• Additional Back Seal<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/BLUELINE.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Product Range</a>&nbsp;<a href="" onclick="javascript:void window.open('Images/Datasheets/WSTUC 012-CPSS-DFH.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Item Data Sheet</a>BLUELINE.pdf WSTUC 012-CPSS-DFH.pdf
Extract pdf name
Cell Formulas
RangeFormula
B1:B2B1=PDFName(A1)
Hi Peter,

This worked brilliantly. Many Many Thanks.

Could I also ask if there is an easy way to remove from all cells the blue range as per below? I now have the image names in a separate column and would like to delete all the data after <br><br>so I am only left with some basic html coding for the description.

As Usual,
Many Thanks,

<img src="Images/skf_snip.jpg">&nbsp;<b>P2BTC 106-TPSS</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 1 3/8" Shaft Diameter<br>• Composite Pillow Block Tapped Base Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/BLUELINE.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Product Range</a>&nbsp;<a href="" onclick="javascript:void window.open('Images/Datasheets/P2BTC 106-TPSS.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Item Data Sheet</a>
 
Last edited by a moderator:
Upvote 0
Could I also ask if there is an easy way to remove from all cells the blue range as per below?
Are you looking to ..
a) Physically remove that blue text from the original data, or
b) Leave the original data where it is and have the non-blue text placed in another column?

If a) we will need to deal with the pdf file names in a different way as currently they are formulas so if you change the original data then the formula results may (will) change. However, that is easy enough to do if a) is what you want. Just need clarity of what you have, where and what you want, where.
 
Upvote 0
Are you looking to ..
a) Physically remove that blue text from the original data, or
b) Leave the original data where it is and have the non-blue text placed in another column?

If a) we will need to deal with the pdf file names in a different way as currently they are formulas so if you change the original data then the formula results may (will) change. However, that is easy enough to do if a) is what you want. Just need clarity of what you have, where and what you want, where.
Hi,

a) Physically remove that blue text from the original data

Thanks,
 
Upvote 0
Hi,

a) Physically remove that blue text from the original data

Thanks,
Hi, apologies for not clarifying. I have the pdf's names in a separate file to the long descriptions as per previous. I would like to totally remove all data after (and including) <br><br><a href= for all items from the long description.

Many Thanks
 
Upvote 0
I think this would do both. Is that any use?
Assumes original data in column A. pdf names are put in column B and column A is amended as requested (I think).

VBA Code:
Sub Process_Data()
  Dim RX As Object, M As Object
  Dim a As Variant
  Dim i As Long
  Dim s As String
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Ignorecase = True
  RX.Pattern = "(Images/Datasheets/)([^<]+?\.pdf)"
  a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    s = vbNullString
    For Each M In RX.Execute(a(i, 1))
      s = s & vbLf & M.SubMatches(1)
    Next M
    a(i, 1) = Mid(s, 2)
  Next i
  Range("B1").Resize(UBound(a)).Value = a
  Columns("A").Replace What:="<br><br><a href=*", replacement:="", LookAt:=xlPart, MatchCase:=False
End Sub

My sample data:
AndrewP70 2020-04-19 1.xlsm
A
1<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/22215E.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Export as PDF</a>
2Other data
3
4<img src="Images/skf_snip.jpg">&nbsp;<b>WSTUC 012-CPSS-DFH</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 3/4" Shaft Diameter<br>• Composite Take-Up Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br>• Additional Back Seal<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/BLUELINE.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Product Range</a>&nbsp;<a href="" onclick="javascript:void window.open('Images/Datasheets/WSTUC 012-CPSS-DFH.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Item Data Sheet</a>
Extract pdf name



.. and results:
AndrewP70 2020-04-19 1.xlsm
AB
1<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.22215E.pdf
2Other data
3
4<img src="Images/skf_snip.jpg">&nbsp;<b>WSTUC 012-CPSS-DFH</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 3/4" Shaft Diameter<br>• Composite Take-Up Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br>• Additional Back Seal<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.BLUELINE.pdf WSTUC 012-CPSS-DFH.pdf
Extract pdf name
 
Upvote 0
I think this would do both. Is that any use?
Assumes original data in column A. pdf names are put in column B and column A is amended as requested (I think).

VBA Code:
Sub Process_Data()
  Dim RX As Object, M As Object
  Dim a As Variant
  Dim i As Long
  Dim s As String
 
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Ignorecase = True
  RX.Pattern = "(Images/Datasheets/)([^<]+?\.pdf)"
  a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    s = vbNullString
    For Each M In RX.Execute(a(i, 1))
      s = s & vbLf & M.SubMatches(1)
    Next M
    a(i, 1) = Mid(s, 2)
  Next i
  Range("B1").Resize(UBound(a)).Value = a
  Columns("A").Replace What:="<br><br><a href=*", replacement:="", LookAt:=xlPart, MatchCase:=False
End Sub

My sample data:
AndrewP70 2020-04-19 1.xlsm
A
1<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/22215E.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Export as PDF</a>
2Other data
3
4<img src="Images/skf_snip.jpg">&nbsp;<b>WSTUC 012-CPSS-DFH</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 3/4" Shaft Diameter<br>• Composite Take-Up Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br>• Additional Back Seal<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.<br><br><a href="" onclick="javascript:void window.open('Images/Datasheets/BLUELINE.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Product Range</a>&nbsp;<a href="" onclick="javascript:void window.open('Images/Datasheets/WSTUC 012-CPSS-DFH.pdf','','width=700,height=500,toolbar=0,menubar=0,location=0,status=1,scrollbars=1,resizable=1,left=0,top=0');return false;"><img src="/Images/Datasheets/snip_pdf.jpg">&nbsp;Item Data Sheet</a>
Extract pdf name



.. and results:
AndrewP70 2020-04-19 1.xlsm
AB
1<img src="Images/skf_snip.jpg">&nbsp;<b>22215 E</b>&nbsp;Spherical Roller Bearing<br><br><b><em>75.00 mm ID x 130.00 mm OD x 31.00 mm Width<br></b></em><br>• Cylindrical Bore<br>• Steel Cage<br>• Standard Clearance<br><br>Spherical roller bearings have two rows of symmetrical rollers, a common sphered outer ring raceway and two inner ring raceways inclined at an angle to the bearing axis. The centre point of the sphere in the outer ring raceway is at the bearing axis.22215E.pdf
2Other data
3
4<img src="Images/skf_snip.jpg">&nbsp;<b>WSTUC 012-CPSS-DFH</b>&nbsp;<span style="color:dodgerBlue"><b>Blue Range Food Line Ball Bearing Unit</b></span><br><br>• 3/4" Shaft Diameter<br>• Composite Take-Up Housing<br>• Stainless Steel (AISI grade 420) Insert Bearing<br>• Food grade grease NSA approved<br>• Additional Back Seal<br><br>The blue range units have been developed for compliance with food safety regulations – with food grade and corrosion resistant components, using blue materials for optical detectability. Each element has been designed to balance the highest performance and superior hygienic standards considering EHEDG guidelines.BLUELINE.pdf WSTUC 012-CPSS-DFH.pdf
Extract pdf name
Sorry, but how do I apply the code? Same as previous (less the formula)? Not very experienced with Macro's and their use.

Many Thanks
 
Upvote 0
Sorry, but how do I apply the code?
To implement ..
1. With your workbook active press Alt+F11 to bring up the vba window.
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Press Alt+F8 to bring up the Macro dialog
6. Select the macro & click ‘Run’
 
Upvote 0
To implement ..
1. With your workbook active press Alt+F11 to bring up the vba window.
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Press Alt+F8 to bring up the Macro dialog
6. Select the macro & click ‘Run’

Absolutely brilliant. I cannot express my gratitude.

Many thanks for your time and effort.

Andrew
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top