Copying data from a web page to excel using vba (absolute novice in this area asking here)

opinionated86

New Member
Joined
Feb 2, 2016
Messages
12
Hi I've googled around a ton and have tried out a few examples but just can't make head nor tails of any of this, I'm self taught with both vba and excel and this just isn't my area so sorry if I'm being stupid.
I want to feed the macro a permalink to an item page and for it to give me back certain data (eg the title of the item) from the page, the pages are all layed out the same so the data is in the same place.
Here is an example of one of the pages:
https://catalogue.leedsbeckett.ac.u...1=GENERAL^SUBJECT^GENERAL^^&user_id=WEBSERVER
*Terrible permalink i know but that's nothing to do with me*

I can see that I can get the label for the title using inspect and I feel like there should be a way to navigate to that using vba but i just can't figure it out.
 
Last edited:

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
Hi;

Assuming that the URL's are in Column A, starting from A1 to downwards, the below code will get the details from and list on the sheet from Column B to Column H, listing; title, author, publisher, published date, number of pages, ISBN and item info.

Code:
Sub Get_Title()
    'Haluk - 23/01/2018
    
    Dim URL As String
    Dim objHTTP As Object, HTMLfile As Object
    
    Set objHTTP = CreateObject("MSXML2.XMLHTTP")
    For i = 1 To Cells(Rows.Count, 1).End(xlUp).Row
        URL = Range("A" & i).Text
        
        objHTTP.Open "GET", URL, False
        objHTTP.Send
        
        Set HTMLfile = CreateObject("HTMLFILE")
        
        If objHTTP.ReadyState = 4 Then
            If objHTTP.Status = 404 Then
                MsgBox "Page not found"
                GoTo SafeExit:
            End If
            If objHTTP.Status = 200 Then
                HTMLfile.Body.innerHTML = objHTTP.responseText
                Set dds = HTMLfile.getElementsByTagName("dd")
                For Each dd In dds
                    If dd.ClassName = "title" Then
                        Range("B" & i) = dd.innerText
                    End If
                    If dd.ClassName = "author" Then
                        Range("C" & i) = dd.innerText
                    End If
                    If dd.ClassName = "publisher" Then
                        Range("D" & i) = dd.innerText
                    End If
                    If dd.ClassName = "publishing_date" Then
                        Range("E" & i) = dd.innerText
                    End If
                    If dd.ClassName = "pages" Then
                        Range("F" & i) = dd.innerText
                    End If
                    If dd.ClassName = "pages" Then
                        Range("F" & i) = dd.innerText
                    End If
                    If dd.ClassName = "isbn" Then
                        Range("G" & i) = dd.innerText
                    End If
                    If dd.ClassName = "copy_info" Then
                        Range("H" & i) = dd.innerText
                    End If
                 Next
            End If
        End If
    Next
    Columns("G:G").NumberFormat = "0"
    ActiveSheet.Columns.AutoFit
    Columns("A:A").ColumnWidth = 25
SafeExit:
    Set objHTTP = Nothing
    Set HTMLfile = Nothing
End Sub
 
Upvote 0
Hi;

Assuming that the URL's are in Column A, starting from A1 to downwards, the below code will get the details from and list on the sheet from Column B to Column H, listing; title, author, publisher, published date, number of pages, ISBN and item info.

Code:
Sub Get_Title()
    'Haluk - 23/01/2018
    
    Dim URL As String
    Dim objHTTP As Object, HTMLfile As Object
    
    Set objHTTP = CreateObject("MSXML2.XMLHTTP")
    For i = 1 To Cells(Rows.Count, 1).End(xlUp).Row
        URL = Range("A" & i).Text
        
        objHTTP.Open "GET", URL, False
        objHTTP.Send
        
        Set HTMLfile = CreateObject("HTMLFILE")
        
        If objHTTP.ReadyState = 4 Then
            If objHTTP.Status = 404 Then
                MsgBox "Page not found"
                GoTo SafeExit:
            End If
            If objHTTP.Status = 200 Then
                HTMLfile.Body.innerHTML = objHTTP.responseText
                Set dds = HTMLfile.getElementsByTagName("dd")
                For Each dd In dds
                    If dd.ClassName = "title" Then
                        Range("B" & i) = dd.innerText
                    End If
                    If dd.ClassName = "author" Then
                        Range("C" & i) = dd.innerText
                    End If
                    If dd.ClassName = "publisher" Then
                        Range("D" & i) = dd.innerText
                    End If
                    If dd.ClassName = "publishing_date" Then
                        Range("E" & i) = dd.innerText
                    End If
                    If dd.ClassName = "pages" Then
                        Range("F" & i) = dd.innerText
                    End If
                    If dd.ClassName = "pages" Then
                        Range("F" & i) = dd.innerText
                    End If
                    If dd.ClassName = "isbn" Then
                        Range("G" & i) = dd.innerText
                    End If
                    If dd.ClassName = "copy_info" Then
                        Range("H" & i) = dd.innerText
                    End If
                 Next
            End If
        End If
    Next
    Columns("G:G").NumberFormat = "0"
    ActiveSheet.Columns.AutoFit
    Columns("A:A").ColumnWidth = 25
SafeExit:
    Set objHTTP = Nothing
    Set HTMLfile = Nothing
End Sub


Thanks you so much for this dude, simple, easy to follow and adapt, exactly what i was looking for but couldn't find!
 
Upvote 0
opinionated86

SORRY to hijack your thread again, I too am trying to get data from a web page,


Ive used this but keep getting an error here
objHTTP.Send

Not sure what I am doing wrong
 
Upvote 0
Ok man I've used that to beautiful affect, now i'm trying it on another website page and it doesn't seem to work:

https://cla.co.uk/check-permissions/search-publications?query=9781446207208&licenceType=132

Guessing it's because it requires a search field to be filled out and as such isn't able to access info without search being done live, so to speak, what follows is my adaption of your lovely work to try and get the info I want which is the publisher, all i'm getting is the same publisher (HarperCollins) for every isbn on the list (which is what i'm searching by):

Code:
Sub Get_alt_pub()    'Haluk - 23/01/2018
    
    Dim URL As String
    Dim objHTTP As Object, HTMLfile As Object
    
    Set objHTTP = CreateObject("MSXML2.XMLHTTP")
    For i = 3 To Cells(Rows.Count, 7).End(xlUp).Row
        URL = "https://cla.co.uk/check-permissions/search-publications?query=" & Range("H" & i).Text & "licenceType=132"
        
        objHTTP.Open "GET", URL, False
        objHTTP.Send
        
        Set HTMLfile = CreateObject("HTMLFILE")
        
        If objHTTP.ReadyState = 4 Then
            If objHTTP.Status = 404 Then
                MsgBox "Page not found"
                GoTo SafeExit:
            End If
            If objHTTP.Status = 200 Then
                HTMLfile.Body.innerHTML = objHTTP.responseText
                Set dds = HTMLfile.getElementsByTagName("dd")
                For Each dd In dds
                    If dd.ID = "search-results-item-0-publisher" Then
                        Range("L" & i) = dd.innerText
                    End If
                 Next
            End If
        End If
    Next
    Columns("H:H").NumberFormat = "0"
    ActiveSheet.Columns.AutoFit
    Columns("G:G").ColumnWidth = 30
    Columns("B:B").ColumnWidth = 30
    Columns("I:I").ColumnWidth = 30
SafeExit:
    Set objHTTP = Nothing
    Set HTMLfile = Nothing
End Sub
Any help would be much apreciated
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,249
Members
452,623
Latest member
Techenthusiast

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top