Macros or something to Extract URLs from pile of data

jcaldon

New Member
Joined
Mar 19, 2012
Messages
5
Hi there,

This is my first post, so I am bit embarrassed to start off here with a question and asking for help when I barely know anyone here. But I am in urgent need of this help and I hope everyone here on the forum is OK with me starting off with a question.

Firstly, the only excel thing I know of is "sorting". Yeah, am an 'excel ignorant' of the worst sort and I've recently began to understand the difference between knowing excel and not, in terms of productivity. Kindly bear with my ignorance level while you read through my question.

What I am looking to do is to get excel to get me values from a data dump.
Basically I have long lines of data that has many things in them including URLs and text. I get this data from search engine results. So I go to Google, enter a keyword, and look into the source code when Google returns results. This source code shows a mix of URLs and plenty of text. I want to make a list of those URLs, while the text is of no use to me.

Therefore, I need XL to filter the URLs from the dump and list it for me.
Fortunately, I know that all URLs in that pile of data start with a HTML tag
HTML:
<h3 class="r"><a href="
and ends with this tag
HTML:
" target=_blank class=l
**** without the ****

So a typical URL would be buried like this in the data pile:
****
HTML:
<h3 class="r"><a href="
http://www.campertravel.com.au/4wd-hire.asp
HTML:
" target=_blank class=l
****

So I need to get excel to locate that URL between these 2 tags and make a list for me. Typically there about 100 to 200 such URLs in that data dump in a long long line of gibberish.

I would want to be able to run a macro or something on that worksheet where it can import a list of URLs from that pile of data.

Again, I am sorry if my definition of my requirement is not clear. Please let me know if anyone can help.

Thanks and regards,
JC
 
Last edited:

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
Tyr this.


Excel 2007<TABLE style="BORDER-RIGHT: #a6aab6 1px solid; BORDER-TOP: #a6aab6 1px solid; BORDER-LEFT: #a6aab6 1px solid; BORDER-BOTTOM: #a6aab6 1px solid; BORDER-COLLAPSE: collapse; BACKGROUND-COLOR: #ffffff" cellPadding=2 rules=all><COLGROUP><COL style="BACKGROUND-COLOR: #e0e0f0" width=25><COL><COL></COLGROUP><THEAD><TR style="COLOR: #161120; BACKGROUND-COLOR: #e0e0f0; TEXT-ALIGN: center"><TH></TH><TH>A</TH><TH>B</TH></TR></THEAD><TBODY><TR><TD style="COLOR: #161120; TEXT-ALIGN: center">1</TD><TD>http://www.campertravel.com.au/4wd-hire.asp" target=_blank class=l</TD><TD>http://www.campertravel.com.au/4wd-hire.asp</TD></TR></TBODY></TABLE>


Sheet1


<TABLE style="BORDER-RIGHT: black 2px solid; PADDING-RIGHT: 0.4em; BORDER-TOP: black 2px solid; PADDING-LEFT: 0.4em; PADDING-BOTTOM: 0.4em; BORDER-LEFT: black 2px solid; PADDING-TOP: 0.4em; BORDER-BOTTOM: black 2px solid; BORDER-COLLAPSE: collapse; BACKGROUND-COLOR: #ffffff" cellPadding=2 rules=all width="85%"><TBODY><TR><TD style="PADDING-RIGHT: 6px; PADDING-LEFT: 6px; PADDING-BOTTOM: 6px; PADDING-TOP: 6px">Worksheet Formulas<TABLE style="BORDER-RIGHT: #a6aab6 1px solid; BORDER-TOP: #a6aab6 1px solid; BORDER-LEFT: #a6aab6 1px solid; BORDER-BOTTOM: #a6aab6 1px solid; BORDER-COLLAPSE: collapse; BACKGROUND-COLOR: #ffffff; TEXT-ALIGN: center" cellPadding=2 rules=all width="100%"><THEAD><TR style="COLOR: #161120; BACKGROUND-COLOR: #e0e0f0"><TH width=10>Cell</TH><TH style="PADDING-LEFT: 5px; TEXT-ALIGN: left">Formula</TH></TR></THEAD><TBODY><TR><TH style="COLOR: #161120; BACKGROUND-COLOR: #e0e0f0" width=10>B1</TH><TD style="TEXT-ALIGN: left">=MID(A1,FIND("http",A1),FIND(" ",A1,FIND("http",A1))-FIND("http",A1)-1)</TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE>

Well now you changed it, lol...but maybe this will help.
 
Last edited:
Upvote 0
@jprofer,

Thanks a bunch for your prompt response. I was going to change the URL to something that didnt make sense and the board says I can't make that change since it's past 10 mins from posting or something like that. So, no am not looking to spam this board.

I tried your formula and it works. But it seems to fetch just the first URL from that data pile. What do I have to do to get XL to get me all the URLs from that pile. I've pasted the pile on this google document to help you have a look at it:
https://docs.google.com/spreadsheet/ccc?key=0Aj0-vuUqBBSVdEt5U0FVYlhVbVNWVFA3M3d6Ujc3TGc

So there must be atleast about 100 URLs in that cell with gibberish text and html tags to go along with them. I am only interested in URLs that begin and end with those unique HTML tags. I cannot paste those tags here since it seems to be messing up the format of my post here :(

Thanks a ton for your help!
JC
<a href="%20%3C/b%3E[/CODE]%3Cb%3E%20and%20ends%20with%20
Code:
" target="_blank" class="l
Thanks" a="" ton="" for="" your="" help!
 
Last edited:
Upvote 0
I was going to suggest getting the URLs from the search results directly, but looks like you found a solution.:)
 
Upvote 0
Thanks Norie :)

Perhaps you can help me with my next hurdle for the day :P
What I need: I am making a list of unique URLs that I am getting from the search results so that I am able to compare them with my older database of URLs. Basically, I don't want to upset website owners by mailing them if a contact has been made already. The older database has all the URLs that I have contacted so far.

The only problem I have is that URLs are not unique even if they are from the same website. Here are some URLs from a website and you will see how they are different:
www.mywebsite.com/mycategory
http://mywebsite.com
blog.mywebsite.com

What I've tried already: I tried using the match and the vlookup function and they both seem to be using exact matches to compare. As you can tell, this is not of help to me.

Is there a way that excel can compare newer URLs from a list to the URLs in my database and differentiate between URLs of different websites to URLs of the same website? I would greatly appreciate your help.

Many thanks,
JC
 
Upvote 0

Forum statistics

Threads
1,221,814
Messages
6,162,131
Members
451,743
Latest member
matt3388

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top