Web scraping not working

Pak Mariman

New Member
Joined
Jan 15, 2017
Messages
21
Trying my hand at web-scraping again. I want to grab a specific text from a website, and to do so the first part of my code dumps the website's response into a cell of the worksheet.

Here is the code:

Code:
Sub Test()
Dim URL As String
Dim HTTPObject, HTMLObject As Object

URL = "https://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=0&ND=3&adjacent=true&locale=en_EP&FT=D&date=20190709&CC=KR&NR=20190082024A&KC=A#"

Set HTTPObject = CreateObject("MSXML2.ServerXMLHTTP")
HTTPObject.Open "GET", URL, False
HTTPObject.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
HTTPObject.send

Set HTMLObject = CreateObject("htmlfile")
    With HTMLObject
        .Open
        .Write HTTPObject.responseText
        .Close
    End With

Cells(1, 1) = HTTPObject.responseText

In the returned text, the information, e.g. the text of the abstract, is not there.

Why? What am I doing wrong?
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
It looks like that website contains dynamic content. So, instead, try using the following macro which uses Internet Explorer.

References

The macro uses early binding, and so it requires that you set two library references (Visual Basic Editor >> Tools >> References):

1) Microsoft Internet Controls​
2) Microsoft HTML Object Library​

Notes

1) It assumes that the data is to be written on the active sheet.​
2) Internet Explorer is set to be visible. You can make it invisible by setting the Visible property to False.​
3) A timer is added so that the macro is aborted when 10 seconds has elapsed without finding the target table. You can change the number of seconds to wait as desired.​

Macro

VBA Code:
Option Explicit

Sub Get_Abstract()

    'Set a reference (VBE > Tools > References) to the following libraries:
    '   1) Microsoft Internet Controls
    '   2) Microsoft HTML Object Library
   
    Dim IE As SHDocVw.InternetExplorer
   
    Set IE = New SHDocVw.InternetExplorer
    With IE
        .Visible = True
        .navigate "https://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=0&ND=3&adjacent=true&locale=en_EP&FT=D&date=20190709&CC=KR&NR=20190082024A&KC=A#"
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
   
    Dim HTMLDoc As MSHTML.HTMLDocument
   
    Set HTMLDoc = New MSHTML.HTMLDocument
    Set HTMLDoc = IE.document
   
    Const SECS_TO_WAIT As Integer = 10 'change the number of seconds to wait as desired
   
    Dim HTMLTable As MSHTML.HTMLTable
    Dim startTime As Date
    Dim aborted As Boolean
   
    startTime = Timer
    aborted = False
    On Error Resume Next
    Do
        Set HTMLTable = HTMLDoc.querySelector(".tableType3")
        If Timer - startTime > SECS_TO_WAIT Then
            aborted = True
            GoTo exitHandler
        End If
    Loop While HTMLTable Is Nothing
    On Error GoTo 0
       
    Dim currentRow As Long
    Dim r As Long
    Dim c As Long
   
    'transfer table to worksheet
    currentRow = 2 'starting worksheet row
    With HTMLTable
        For r = 1 To .Rows.Length - 1
            For c = 0 To .Rows(r).Cells.Length - 1
                Cells(currentRow, c + 1).Value = .Rows(r).Cells(c).innerText
            Next c
            currentRow = currentRow + 1
        Next r
    End With
   
    'format table
    With Cells
        .WrapText = False
        .Columns.AutoFit
    End With
   
    'transfer abstract to worksheet
    Cells(currentRow + 1, "A").Value = HTMLDoc.querySelector(".printAbstract").innerText
   
exitHandler:
    If aborted Then
        MsgBox "Unable to find table!", vbExclamation
    End If

    Set HTMLTable = Nothing
    Set HTMLDoc = Nothing
    Set IE = Nothing
   
End Sub

Hope this helps!
 
Upvote 0

Forum statistics

Threads
1,224,816
Messages
6,181,139
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top