Webscraping a site with no element ids and returning data

fizzydrink

New Member
Joined
Jul 25, 2023
Messages
5
Office Version
  1. 2019
  2. 2016
Platform
  1. Windows
Hi,


I have seen this question googled a few times, but just can’t quite seem to get it.
I have been copying pasting and reading way too many websites. I can read and follow along vba, writing however is completely not me.
I am trying to scrape several hundred cage codes from the site:

https://cage.dla.mil/Search/

(You must be logged in and have it open first to work)

I want it to search my spreadsheet for my manufacturer number, in this case “Z06P7” from cell J1 in Sheet1
Then place that number in the search bar, submit, it and display the results.
I can do all this fine.

The part I have issue with is returning the result to Excel.
At the minimum, I am trying to get the company name to return to K1.
What I’d really like is to follow the “Details” link and then get the Company name and Phone number.
Then loop to the next row down and do again with “ZD484” etc until all (approx. 600) are complete.

I have also tried the site CAGE code lookup as this does the same thing, but having same sort of issues in returning the correct text.
I have tried element class/tags/id (‘a’. ‘tr’td’ but cant seem to draw the correct info

Any tips are appreciated.

My current code is:


Sub searchcage()

Dim ws As Worksheet
Dim targetrange As Range
Dim cell As Range
Dim copieddata As String
Dim pasteddata As String
Dim ie As Object
Dim webpage As HTMLDocument
Dim table_data As Object



Set ws = Worksheets("sheet1") ' replace with sheet name "supply support table"
Set targetrange = ws.Range("j2:j" & ws.Cells(ws.Rows.Count, "j").End(xlUp).Row) ' define search range
Set ie = CreateObject("internetexplorer.application") ' set internet explorer as ie

For Each cell In targetrange
copieddata = "Z06P7" 'copieddata = cell.Value

'open website

ie.navigate "https://cage.dla.mil/Search/" 'https://www.cage-codes.com”
ie.Visible = True

'wait for website to load
Do Until ie.readystate = 4
DoEvents
Loop

'enter copied data into search bar
ie.document.getElementById("SearchString").Value = copieddata '"SearchString" '"searchbox_id" for cage-codes
ie.document.forms(0).submit

'wait results to load
Do Until ie.readystate = 4
DoEvents
Loop

'copy the result from webpage
Set webpage = ie.document
Set table_data = webpage.getElementByTagName("Details") '("sortedby") Set table_data = webpage.getElementsByClassName("sortedby")

'paste result

ws.Range("K1").Value = table_data
'clean up

ie.Quit
Set ie = Nothing

Next

MsgBox "complete"

End Sub
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
I want it to search my spreadsheet for my manufacturer number, in this case “Z06P7” from cell J1 in Sheet1
Then place that number in the search bar, submit, it and display the results.
I can do all this fine.

The part I have issue with is returning the result to Excel.
At the minimum, I am trying to get the company name to return to K1.
What I’d really like is to follow the “Details” link and then get the Company name and Phone number.
Then loop to the next row down and do again with “ZD484” etc until all (approx. 600) are complete.

I have also tried the site CAGE code lookup as this does the same thing

For me, Z06P7 doesn't have a phone number and ZD484 doesn't exist; both cases are handled by this code. Testing just CAGE codes in J1:J2 and returning the results to the K and L cells.

VBA Code:
Option Explicit

#If VBA7 Then
    Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#Else
    Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If


Public Sub Get_All_Data()

    Dim CAGEcodeCell As Range
  
    For Each CAGEcodeCell In Worksheets("Sheet1").Range("J1:J2")
        Get_CAGE_code_data CAGEcodeCell
    Next

End Sub


Private Sub CDFU_Get_CAGE_code_data(CAGEcodeCell As Range)

    Dim HTMLdoc As HTMLDocument, HTMLdoc2 As HTMLDocument
    Dim dataTable As HTMLDivElement
    Dim lines As Variant, i As Long
  
    Set HTMLdoc2 = New HTMLDocument
    Set HTMLdoc = HTMLdoc2.createDocumentFromUrl("https://www.cage-codes.com/?q=" & CAGEcodeCell.Value, "")
    While HTMLdoc.readyState <> "complete": DoEvents: Sleep 100: Wend
  
    '<div class="data-table">
  
    CAGEcodeCell.Offset(, 1).Resize(, 2).ClearContents
    Set dataTable = HTMLdoc.querySelector("div.data-table")
    If Not dataTable Is Nothing Then
        lines = Split(dataTable.innerText, vbCrLf)
        For i = 0 To UBound(lines)
            Select Case lines(i)
                Case Is = "Company": CAGEcodeCell.Offset(, 1).Value = lines(i + 1)
                Case Is = "Phone":   CAGEcodeCell.Offset(, 2).Value = lines(i + 1)
            End Select
        Next
    Else
        CAGEcodeCell.Offset(, 1).Value = "Code not found"
    End If
  
End Sub
 
Upvote 0
Hi John,

Thanks for the reply. by looking at your code, it's apparent I have no idea what I am doing.
When running the above code, it does show "Code not found" if there is an error such as ZD484, but doesn't return anything if the result is found?

Is it possible to get the above to run on the https://cage.dla.mil/Search/ site, as I know that it will return a result for zd484.

Thanks

FD
 
Upvote 0
When running the above code, it does show "Code not found" if there is an error such as ZD484, but doesn't return anything if the result is found?
The same as a manual search on cage-codes.com, ZD484 isn't found and Z06P7 is found and returns results for me.

Is it possible to get the above to run on the https://cage.dla.mil/Search/ site, as I know that it will return a result for zd484.
I didn't try that site because you said you must be logged in, and I don't have an account. If the createDocumentFromUrl approach doesn't work for that site you would need to use IE or Edge automation.
 
Upvote 0
Thanks John,
The same as a manual search on cage-codes.com, ZD484 isn't found and Z06P7 is found and returns results for me
Strange, i just get a blank space.

I didn't try that site because you said you must be logged in
Sorry, incorrect term i guess, you have to have accepted the terms, there is no login as such.

But thanks for your efforts. I have played with your code and added it to some other code, and can I get it click the details now.

I think I can get the table sussed now. if I do, shall post here.

cheers
 
Upvote 0
Hi,

I have tried using a different site, as this gives me all the codes back.
Z10F3
Z14J6
ZD484

I have mashed some of your code again. I have used the tag header 1(h1) there is only one in the code..

This time, debugging the innertext to the immediate window shows the correct info, however exporting to the excel cell, it puts the below.

[object HTMLHeadingElement]

i feel it is something simple. but cant suss it

Cheers



my code is..


Option Explicit



Dim ie As Object
Dim elementcol As Object
Dim detailslink As Variant
Dim link As Object



Public Sub Get_All_Data()

Dim CAGEcodeCell As Range

For Each CAGEcodeCell In Worksheets("Sheet1").Range("J1:J3")
Get_CAGE_code_data CAGEcodeCell
Next

End Sub


Private Sub Get_CAGE_code_data(CAGEcodeCell As Range)

'open website
Set ie = CreateObject("internetexplorer.application")
ie.Navigate ("https://cage.report/NCAGE/" & CAGEcodeCell.Value & "")
ie.Visible = True

'wait for website to load
Do While ie.Busy Or ie.ReadyState <> 4
DoEvents
Loop


'find click details link

Set elementcol = ie.Document.getElementsByTagName("h1")
For Each detailslink In elementcol
Debug.Print detailslink.innerText

CAGEcodeCell.Offset(, 1).Value = detailslink
'End If
Next detailslink



End Sub
 
Upvote 0
All good, i think i have it!

Thanks for your assistance.

VBA Code:
Option Explicit


Dim copieddata As String
Dim ie As Object
Dim elementcol As Object
Dim dataTable As HTMLDivElement
Dim lines As Variant, i As Long
  Dim detailslink As Variant
  Dim link As Object



Public Sub Get_All_Data()

    Dim CAGEcodeCell As Range
 
    For Each CAGEcodeCell In Worksheets("Sheet1").Range("J1:J20")
        Get_CAGE_code_data CAGEcodeCell
    Next

End Sub


Private Sub Get_CAGE_code_data(CAGEcodeCell As Range)

   'open website
  Set ie = CreateObject("internetexplorer.application")
ie.Navigate ("https://cage.report/NCAGE/" & CAGEcodeCell.Value & "")
ie.Visible = False

'wait for website to load
Do While ie.Busy Or ie.ReadyState <> 4
DoEvents
Loop


'find name and link

Set elementcol = ie.Document.getElementsByTagName("h1")
For Each detailslink In elementcol
ActiveSheet.Hyperlinks.Add Anchor:=CAGEcodeCell.Offset(, 1), Address:="https://cage.report/NCAGE/" & CAGEcodeCell.Value & "", SubAddress:="", ScreenTip:="", TextToDisplay:=detailslink.innerText


'End If
Next detailslink

End Sub
 
Upvote 0
Solution

Forum statistics

Threads
1,223,882
Messages
6,175,164
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top