Retrieving Info from an HTML site (source)

lee2smooth03

New Member
Joined
Nov 13, 2011
Messages
20
Hello All,

I am working on a statistics project that I could really use some help on. In short, I need to analyze data from a given website to draw the academic conclusions.

The second caveat is that I'm extremely new (and extremely green) about using VBA so I'm not particularly familiar with every method/property/event/etc. that may be available for use.

Here's the scheme of the algorithm that I want to write:

- go to the site
- find the data to be extracted from the source code
- extract the data from the site
- deliver the extracted data back to excel and drop it in a reference cell (to be added to the database)

Some of the lingo that I've seen include terms like "parsing" and there's even a property (or method...or function...IDK) called getElementById() or it's called getElementByTag(), but whenever this is applied to my defined object, I get an error.

Something tells me that I'm close, but I could really use some help in getting over this hump. Also, I'd like to open the page using Google Chrome, but so far, my code only seems to work for Internet Explorer

My (anemic) Code Thus Far
==========================================================

Dim pracPage As Object
Set pracPage = CreateObject("internetexplorer.application")

With pracPage
.Visible = True
.navigate "www.someRandomWebPage.com"

While pracPage.busy
DoEvents
Wend

.Toolbar = True
End With
==========================================================
By the way, I'm working on with the following software:
- window's 7 home premium
- MS Excel 2010 (VBA7)

Thanks for anything that helps (and even things that don't) ;)
 

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
Here is what I think you should try first:


  • Go to an empty sheet
  • Do a Data | Get External Data | New Web Query
  • Fill in the form
    • URL
    • Tables only or all
    • Once you know where your data is on the web page, you can refine this
  • Run the query
  • This will download the URL to the spread sheet
  • You may use this data as if it were any other sheet
  • To download again do a refresh

With Range("Prices!A1").QueryTable ' With
Sheets(PRICES_WORKSHEET).QueryTables ' fill in prices from Internet
.BackgroundQuery = False 'errors to be captured
.Refresh ' refresh query
End With
 
Upvote 0
lee2

The code you posted is sort of a 'skeleton' of what you would need to do if you were going to automate IE to do this.

Can you post more details, eg URL, the data you want to get.

However there are other ways to do this.

The one tlowry suggested might be worth a try.:)

PS Just seen the mention of Google Chrome. Is there a reason you want/need to use it?
 
Upvote 0
Thank you both for replying so quickly

First, let me apologize if my initial question was ambiguous in any fashion. I did not post the link that I'm working with because I'm not at liberty to share. However, I spent the first half of the evening reviewing similar websites and here's what I've come up with.

Norie:

you asked for a url; the following is an page that has similar code to the kind that I want to extract:


this page works well for a number of reasons:
1.) in the source code, the Document types look to be the same:

HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

2.) The inconsistency of the Query Table feature (property/method)
==========================================================

I have played around with the Query Table before when trying to import data from the same page. My problem with importing from the outside source is that the advancement of the page seems to determine what the function can read (or see) and the information dump is usually sloppy in my opinion. My other problem is that during the selection of the information, the query tries to grab the entire page instead of the tables that contain data

My intent is to have the code import to the page as cleanly as possible; that's why I'm trying to gain more control over what comes into the page and where it is destined

On the given URL, I would like for my code to get the market values of the DOW, NASDAQ, and S&P 500 on an hourly basis. The problem is that importing data using "Get External Data" group seems to miss these values. On the contrary, I could import the fixed mortgage rates at the bottom of the page and they appeared just fine (and tidy) in the designated cell. Since I could not import the values that I wanted using the G.E.D., I want to resort to code. Using the code I would navigate to the following lines:

HTML:
<li id="dow" selected>
		<a href="/data/markets/dow/">
		<span class="cnncol1">Dow</span>
		<span class="cnncol2"><span stream="last_599362" streamFormat="ToHundredth">12,078.98</span></span>
		<span class="cnncol3"><span stream="change_599362"><span class="negData">-74.70</span></span></span>
		<span class="cnncol4"><span stream="changePct_599362"><span class="negData">-0.61%</span></span></span></a>
	</li>
	<li id="nasdaq">
		<a href="/data/markets/nasdaq/">
		<span class="cnncol1">Nasdaq</span>
		<span class="cnncol2"><span stream="last_579435" streamFormat="ToHundredth">2,657.22</span></span>
		<span class="cnncol3"><span stream="change_579435"><span class="negData">-21.53</span></span></span>
		<span class="cnncol4"><span stream="changePct_579435"><span class="negData">-0.80%</span></span></span></a>
	</li>

To retrieve the values after "streamFormat=" tags...

This is what I want to know how to do in excel using getElementById() and getElementByTag() functions (or methods or properties..etc)

:eeek:
 
Upvote 0
I did this in Excel 2003
Its a mixture of javascript and VBA inside excel

It doesn't really do what you want - it only works with google finance
but it could be adjusted to work with other web pages
so maybe that will meet your needs

Code:
Option Explicit
'*******************************************************
' code almost an exact copy of
' http://stackoverflow.com/questions/6627652/parsing-json-in-excel-vba
'*******************************************************
Private ScriptEngine As Object
'*******************************************************
Private Sub cmdGetQuote_Click()
    ' on an excel spreadsheet ("sheet1") I made a commandbutton
    ' and renamed it to cmdGetQuote
    ' this is the Click routine for that button
    
    ' the click routine calls getQuote
    getQuote
End Sub
'*******************************************************
Public Function DecodeJsonString(ByVal JsonString As String)
    ' Json is a way to layout data, sort of xml, but more like array notation
    ' this function decodes the json data and makes it into a javascript associative array
    ' that's right - javascript
    ' associative arrays are like vb dictionary collections
    Set DecodeJsonString = ScriptEngine.Eval("(" + JsonString + ")")
End Function
'*******************************************************
Public Function GetProperty(ByVal JsonObject As Object, ByVal propertyName As String) As Variant
    ' run the javascript getProperty method
    ' basically pass it a key and get a value
    GetProperty = ScriptEngine.Run("getProperty", JsonObject, propertyName)
End Function
'*******************************************************
Public Function GetObjectProperty(ByVal JsonObject As Object, ByVal propertyName As String) As Object
    ' not used in ths process
    ' kept only because its part of the complete code from the web site
    ' its used to set values, which we obviously aren't doing
    Set GetObjectProperty = ScriptEngine.Run("getProperty", JsonObject, propertyName)
End Function
'*******************************************************
Public Function GetKeys(ByVal JsonObject As Object) As String()
    
    ' get a list of each Key in the Json object/dictionary
    ' put them in a vb string array
    
    Dim Length As Integer
    Dim KeysArray() As String
    Dim KeysObject As Object
    Dim index As Integer
    Dim key As Variant
    
    ' get a list of each Key in the Json object/dictionary
    Set KeysObject = ScriptEngine.Run("getKeys", JsonObject)
    Length = GetProperty(KeysObject, "length")
    ReDim KeysArray(Length - 1)
    index = 0
    For Each key In KeysObject
        KeysArray(index) = key
        index = index + 1
    Next
    GetKeys = KeysArray
    
End Function
'*******************************************************
Public Sub InitScriptEngine()
    
    ' IMPORTANT ! ! !
    ' IMPORTANT ! ! !
    ' IMPORTANT ! ! !
    ' THIS MAY FAIL ! ! !
    Set ScriptEngine = CreateObject("MSScriptControl.ScriptControl") ' -- "msscript.ocx"
    ScriptEngine.Language = "JScript"
    ' create 2 custom functions in javascript
    ScriptEngine.AddCode "function getProperty(jsonObj, propertyName) { return jsonObj[propertyName]; } "
    ScriptEngine.AddCode "function getKeys(jsonObj) { var keys = new Array(); for (var i in jsonObj) { keys.push(i); } return keys; } "
End Sub
'*******************************************************
Sub getQuote()
    
    Dim draw_header_row As Boolean
    draw_header_row = True
    
    Dim wks As Worksheet
    Dim r As Integer
    Dim c As Integer
    
    ' IMPORTANT ! ! !
    ' putting it in Sheet2
    Set wks = Worksheets("sheet2")
    r = 1
    c = 1
    
    InitScriptEngine
    
    Dim JsonObject As Object
    Dim Keys() As String
    Dim key As Integer
    Dim Value As Variant
    
    ' IMPORTANT ! ! !
    ' IMPORTANT ! ! !
    ' select Tools / References / Microsoft XML, v6.0
    ' earlier versions may work ?
    Dim XMLHttpRequest As MSXML2.xmlhttp
    Set XMLHttpRequest = New MSXML2.xmlhttp
    
    ' waht we get back form the web page
    Dim response As String
    
    ' the abbreviations I got from google
    'INDEXDJX:.DJI = DOW
    'INDEXNASDAQ:.IXIC = NASDAQ
    'INDEXSP:.INX = S&P 500
    Dim symbol As Variant
    symbol = Array("INDEXDJX:.DJI", "INDEXNASDAQ:.IXIC", "INDEXSP:.INX")
    
    Dim index As Integer
    Dim uri As String
    
    ' type this into google and you'll see the way a page normally displays
    ' http://www.google.com/finance?q=IBM
    ' type this into google and you'll see the Json data structure that we're getting back
    ' http://www.google.com/finance/info?q=IBM
    For index = 0 To UBound(symbol)
        ' we're going to do this for each symbol in our symbol array
        ' build the url
        uri = "http://www.google.com/finance/info?q=" & CStr(symbol(index))
        ' setup the web request
        XMLHttpRequest.Open "GET", uri, False
        ' send the request to the web
        XMLHttpRequest.Send
        ' get a response from google
        ' google has decided to send their responses as Jason data structures
        ' Json is a web standard, sort of like xml, but sort of like vb dictionaries
        ' it comes to us as a string, just as xml is a string
        ' but we need to make it into a dictionary/Json object
        ' just as you would make an xml string into a xml object
        
        ' IMPORTANT ! ! !
        ' IMPORTANT ! ! !
        ' you can put any url instead of google finance
        ' you could even put
        ' uri = "http://www.cnn.com/"
        ' if the web page you want doesn't send Json, then its no big deal
        ' this response would have just been one very long string of html
        response = XMLHttpRequest.responseText
        Debug.Print response
        ' we have to do this, I'm tired explaing about Json
        response = Mid(response, 6, Len(response) - 7)
        Debug.Print response
        
        ' construct a dictionary/Json object from the string
        Set JsonObject = DecodeJsonString(response)
        
        ' get the Keys in the dictionary/Json object
        Keys = GetKeys(JsonObject)
        
        If draw_header_row Then
            ' put a header on the first row of the spreadsheet
            draw_header_row = False
            For key = 0 To UBound(Keys)
                wks.Cells(r, c).Value = Keys(key)
                c = c + 1
            Next
        End If
        
        r = r + 1
        c = 1
        ' now look up each value in the dictionary/Json object
        ' and fill in the spreadsheet
        For key = 0 To UBound(Keys)
            Value = GetProperty(JsonObject, Keys(key))
            wks.Cells(r, c).Value = Value
            c = c + 1
        Next
        
    Next
    
End Sub
'*******************************************************
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,275
Messages
6,171,121
Members
452,381
Latest member
Nova88

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top