VBA scrapping of USPTO website

ScooterNorm

New Member
Joined
Feb 25, 2024
Messages
22
Office Version
  1. 2016
Platform
  1. Windows
Excel Version (Office 2016)

Excel Environment (desktop, Windows)

Knowledge Level - Intermediate level

I'm trying to scrape off the USPTO website. When I do a manual search the URL is: https://tsdr.uspto.gov and I enter a serial number (i.e. 79349658), press the status button and the search is executed and the results are returned.

The resulting URL of the search page is:

https://tsdr.uspto.gov/#caseNumber=...TION&caseType=DEFAULT&searchType=statusSearch

When I try to execute this with VBA, it returns the tsdr.uspto.gov web page, not the result of the search. Any idea what I'm doing wrong?

Here's the code,

Sub Test()
Dim ht As HTMLDocument
Dim IE As InternetExplorer

Set IE = New InternetExplorer
IE.Visible = True
IE.navigate ("Trademark Status & Document Retrieval" & _
"&caseSearchType=US_APPLICATION" & _
"&caseType=DEFAULT" & _
"&searchType=statusSearch")

Do Until IE.readyState = READYSTATE_COMPLETE And IE.Busy = False
DoEvents
Loop

Set ht = IE.document

'IE.Quit

End Sub
 
I copied and pasted your code into a regular module, and then I added a reference to Microsoft Internet Controls. And then I simply ran your code.
 
Upvote 0

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
The 'x' that appears on the right hand side of the case number is there to allow the user to delete the case number by simply clicking on the 'x'. And whenI try your code, this is what I get...

View attachment 107487

Here's what it looks like when I run it with the VB script. Notice the URL is the search URL in the 1st photo. Then when I manually enter the case number, the small appears and when I press the Status button, the x goes away but nothing happens.
 

Attachments

  • IMG_3652 (2).jpg
    IMG_3652 (2).jpg
    139.3 KB · Views: 13
  • IMG_3651 (1).jpg
    IMG_3651 (1).jpg
    130.5 KB · Views: 14
Upvote 0
I copied and pasted your code into a regular module, and then I added a reference to Microsoft Internet Controls. And then I simply ran your code.
May I ask what version of Excel and IE you are running?
 
Upvote 0
I'm using Microsoft 365, and when I check About Internet Explorer, I only see that it says Windows 11, version 22H2. Also, my security setting under Internet Options is set to Medium High, in case that makes a difference.
 
Upvote 0
Here's an alternative method...

VBA Code:
Sub test()

    'Set a reference (VBE > Tools > References) to the following libraries:
    '   1) Microsoft XML, v6.0
    '   2) Microsoft HTML Object Library

    Dim xmlReq As MSXML2.ServerXMLHTTP60
    Dim htmlDoc As MSHTML.HTMLDocument
    Dim strURL As String
    Dim strResp As String
    
    Set xmlReq = New MSXML2.ServerXMLHTTP60
    
    Set htmlDoc = New MSHTML.HTMLDocument
    
    strURL = "https://tsdr.uspto.gov/statusview/sn79349658"
    
    With xmlReq
        .Open "GET", strURL, False
        .send
        If .Status <> 200 Then
            MsgBox "Error " & .Status & ":  " & .statusText
            Exit Sub
        End If
        strResp = .responseText
    End With
    
    htmlDoc.body.innerHTML = strResp
    
    'etc
    '
    '
    
    Set xmlReq = Nothing
    Set htmlDoc = Nothing

End Sub

Hope this helps!
 
Upvote 1
I'm using Microsoft 365, and when I check About Internet Explorer, I only see that it says Windows 11, version 22H2. Also, my security setting under Internet Options is set to Medium High, in case that makes a difference.
I'm using Microsoft Office Professional Plus 2016. And the VB is firing up IE. I wonder if it should fire up Microsoft Edge.

I have Microsoft Edge Version 121.0.2277.106 (Official build) (64-bit) on my system. I'll look in to how to fire off Edge instead of IE.
 
Upvote 0
Have you tried the alternative method in Post #15 ?

To automate Microsoft Edge, have a look at the following link...

 
Upvote 0
Here's an alternative method...

VBA Code:
Sub test()

    'Set a reference (VBE > Tools > References) to the following libraries:
    '   1) Microsoft XML, v6.0
    '   2) Microsoft HTML Object Library

    Dim xmlReq As MSXML2.ServerXMLHTTP60
    Dim htmlDoc As MSHTML.HTMLDocument
    Dim strURL As String
    Dim strResp As String
   
    Set xmlReq = New MSXML2.ServerXMLHTTP60
   
    Set htmlDoc = New MSHTML.HTMLDocument
   
    strURL = "https://tsdr.uspto.gov/statusview/sn79349658"
   
    With xmlReq
        .Open "GET", strURL, False
        .send
        If .Status <> 200 Then
            MsgBox "Error " & .Status & ":  " & .statusText
            Exit Sub
        End If
        strResp = .responseText
    End With
   
    htmlDoc.body.innerHTML = strResp
   
    'etc
    '
    '
   
    Set xmlReq = Nothing
    Set htmlDoc = Nothing

End Sub

Hope this helps!
Hey Dominic,
Yes, that got my page returned perfectly.
Just one more question, if you don't mind. So the tags that I want to pull look like this and I can't figure out to get the web address out of them.

VBA Code:
<div class="value">
<a href="https://www.nubiafilmworks.com/">https://www.nubiafilmworks.com/</a>
<a href="https://www.nubiafilmworks.com/about-us-1">https://www.nubiafilmworks.com/about-us-1</a>
</div>

How do I get the "Home | NUBIA FILMWORKS, LLC" out of it. I'm a bit of a beginner, so thanks again.
 
Upvote 0
Have you tried the alternative method in Post #15 ?

To automate Microsoft Edge, have a look at the following link...

Yes, it work and I marked it as solved, I just have one more question in Post #18 if you'd be so kind.
Thanks again
 
Upvote 0
Here's an alternative method...

VBA Code:
Sub test()

    'Set a reference (VBE > Tools > References) to the following libraries:
    '   1) Microsoft XML, v6.0
    '   2) Microsoft HTML Object Library

    Dim xmlReq As MSXML2.ServerXMLHTTP60
    Dim htmlDoc As MSHTML.HTMLDocument
    Dim strURL As String
    Dim strResp As String
   
    Set xmlReq = New MSXML2.ServerXMLHTTP60
   
    Set htmlDoc = New MSHTML.HTMLDocument
   
    strURL = "https://tsdr.uspto.gov/statusview/sn79349658"
   
    With xmlReq
        .Open "GET", strURL, False
        .send
        If .Status <> 200 Then
            MsgBox "Error " & .Status & ":  " & .statusText
            Exit Sub
        End If
        strResp = .responseText
    End With
   
    htmlDoc.body.innerHTML = strResp
   
    'etc
    '
    '
   
    Set xmlReq = Nothing
    Set htmlDoc = Nothing

End Sub

Hope this helps!
Hey Dominic,
I'm trying to convert to ServerXMLHTTP60 and am getting stuck here.

....
`htmlDoc.body.innerHTML = strResp
'Set elems1 = htmlDoc.body.getElementsByClassName("double table")
....

Also, I've seen some comments where ServerXMLHTTP60 doesn't work but ServerXMLHTTP does.
Your thoughts?

Thanks,
-Norm
 
Upvote 0

Forum statistics

Threads
1,224,816
Messages
6,181,141
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top