Extract the URL and display text of a HTML hyperlink tag in a simple string variable

AOB

Well-known Member
Joined
Dec 15, 2010
Messages
669
Office Version
  1. 365
  2. 2016
  3. 2013
Platform
  1. Windows
I'm trying to extract the URL and display text of a HTML hyperlink tag in the form of a simple string variable.

So the string variable will hold a value along the lines of :
HTML:
<a href="https://www.somewebsite.com">Visit Some Website</a>

And I want to pull out the URL ("https://www.somewebsite.com") and the display text ("Visit Some Website") into two separate string variables somehow.

Bearing in mind, I don't have control over the source HTML tag that fills the original string variable (i.e. there may be whitespace/s between the <a tag and the href etc. - so I need something consistent & reliable to account for that)

Pretty sure a couple of regex's are the answer here but struggling to come up with the pattern(s) if anybody might have any pointers?

Thanks!
 
Last edited:

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
Maybe:
VBA Code:
Sub test()
Dim h As String, strLink As String, strText As String
h = "<a href=""https://www.somewebsite.com"">Visit Some Website</a>"
strLink = Split(h, """")(1)
strText = Replace(Split(h, ">")(1), "</a", "")

MsgBox strLink & ", " & strText

End Sub
 
  • Like
Reactions: AOB
Upvote 0
Thanks @Scott Huish - yeah I was thinking along those same lines but was worried that I couldn't necessarily guarantee the link would be wrapped in double-quotes (it could be single quote / apostrophe) and also possible there could be other single/double-quotes within the tag that could potentially mess me up.

I've actually just copped that you can get submatches from the original regex that I was using to extract the full tags from the full HTML I'm trying to parse so that's actually ended up the easiest / most accurate way of splitting out the links and display texts.

This is that regex :

VBA Code:
Set objRegexHTMLHyperlinks = CreateObject("VBScript.RegExp")
With objRegexHTMLHyperlinks
    .MultiLine = False
    .Global = True
    .IgnoreCase = True
    .Pattern = "<a\s+(?:[^>]*?\s+)?href=(["" '])(.*?)(["" '])>(.*?)</a>"
End With

And then when I execute that against the full text into a match collection, I get submatches for each match with each grouping :
  1. Group 1 is the single/double-quote directly following the href
  2. Group 2 is the text directly following the single/double-quote (i.e. the URL)
  3. Group 3 is the single/double-quote directly following the URL (closing the string)
  4. Group 4 is the text between the > and the </a> (i.e. the display text)
So it looks like I have this working now, although I have to stress test that pattern now against various possibilities that might arise, although at first glance it seems pretty good?
 
Upvote 0
Solution

Forum statistics

Threads
1,224,823
Messages
6,181,175
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top