Error duplicating data when scraping from Web to Excel

randaubienghoc

New Member
Joined
Apr 25, 2020
Messages
19
Office Version
  1. 2016
Platform
  1. Windows
Hi guys,

I am looking for your kind help on my coding to pull data from Web to Excel.

Web to get data: https://eport.saigonnewport.com.vn/Pages/Common/Containers_new

  • Steps to get data:
Put "Cát Lái" to field "Khu vực giao nhận container" (choose seaport)

Put container number to field "Container"

Deselect "Chỉ vòng luân chuyển cuối" to show all rows in data table

Click Search to show data table - result for searching Container information

*****ISSUE: Data scraped from web to Excel for each row in Excel (respective to each container number found) seems to be the same with the previous result WHILE information for this container number can be blank. For example: Event time 2 "10/4/2020 3:07:00 PM" is repeated for container "TEMU3311320" while this container does not have Event time 2.

Hope you could give me any advices to resolve this duplicating issue. Attached Excel file for your references. Thanks. Link download Excel file example

VBA Code:
Sub PullDataFromWeb()
  Dim IE As Object, W As Excel.Worksheet
  Dim doc As HTMLDocument
  Dim lastRow As Integer, b As Boolean, tmp As String
  Dim lis, li
  Set W = ThisWorkbook.Sheets("Sheet1")
  Set IE = VBA.CreateObject("InternetExplorer.Application")
  IE.Visible = True   'hien cua so IE
  IE.navigate "https://eport.saigonnewport.com.vn/Pages/Common/Containers_new"
  Do While IE.Busy Or IE.readyState <> 4      'doi IE chay xong
    Application.Wait DateAdd("s", 1, Now)
  Loop
  Set doc = IE.document

  lastRow = W.Range("B" & W.UsedRange.Rows.Count + 2).End(xlUp).Row        'dong cuoi cung trong cot B container
  If lastRow < 2 Then GoTo Ends
  On Error Resume Next
  For intRow = 2 To lastRow     'tu dong toi dong
    b = False
    b = W.Range("I" & intRow).Value Like "[Yy]"
    If W.Range("B" & intRow).Value <> "" And Not b Then
      doc.getElementById("txtItemNo_I").Value = W.Range("B" & intRow).Value 'so cont
      doc.getElementById("cbSite_VI").Value = W.Range("A" & intRow).Value
      doc.getElementById("chkInYard_I").Checked = False
      doc.getElementById("ContentPlaceHolder2_btnSearch").Click 'click Search
      '----------------------------------------------
      Do While IE.Busy Or IE.readyState <> 4
        Application.Wait DateAdd("s", 1, Now)
      Loop
      '----------------------------------------------
      strFindContainer = doc.getElementById("ContentPlaceHolder2_lblNotice").innerText
      W.Range("H" & intRow) = strFindContainer
      If strFindContainer Like "T*m th*y * container*" Then
        strEventtime1 = doc.getElementById("grdContainer_DXDataRow0").Cells(0).innerText
        strEventtype1 = doc.getElementById("grdContainer_DXDataRow0").Cells(1).innerText
        strLocation1 = doc.getElementById("grdContainer_DXDataRow0").Cells(2).innerText
        strEventtime2 = doc.getElementById("grdContainer_DXDataRow1").Cells(0).innerText
        strEventtype2 = doc.getElementById("grdContainer_DXDataRow1").Cells(1).innerText
        W.Range("C" & intRow) _
          .Resize(, 5).Value = Array(strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2)
      End If
    End If
  Next
Ends:
  IE.Quit
  Set IE = Nothing    'Cleaning up
  Set objElement = Nothing
  Set objCollection = Nothing
  Application.StatusBar = ""
  Application.DisplayAlerts = True
End Sub
 

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop
I haven't downloaded the code, so I could be wrong, but it seems to me that from just look at the code that:

Missing container name
You're not getting the container name because your code doesn't appear to try and get it. You have variables for EventTime1, EventType1 and Location, and those appear to be sourced from the element on the website, but spread out across different cells EventType1 = cell(0), EventTime1 = cell(1), and so forth. You don't have a variable for the container name - for your reference, the container name information is located in the cell(4):
VBA Code:
strContainerName = doc.getElementById("grdContainer_DXDataRow0").Cells(4).innerText

You then need to rewrite this line to make sure the info is then written to the worksheet:

VBA Code:
W.Range("C" & intRow) _
          .Resize(, 5).Value = Array(strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2)

I don't know what other data is on the sheet, but I can see that Column H is populated by the notification text on how many containers were found. I would suggest commenting out the following line:
W.Range("H" & intRow) = strFindContainer
and instead put that information in strFiendContainer with the rest of the array:

VBA Code:
W.Range("C" & intRow) _
          .Resize(, 7).Value = Array(strContainerName, strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2, strFindContainer)

Duplicate Entries
Looking at your For ... Next loop, you don't seem to refresh the variables - meaning that you don't delete their values, so if they are not overwritten by the data obtained during the scrape, they will just continue to record the last thing that they scrape.

Does that sound like it might be right?
 
Upvote 0
I haven't downloaded the code, so I could be wrong, but it seems to me that from just look at the code that:

Missing container name
You're not getting the container name because your code doesn't appear to try and get it. You have variables for EventTime1, EventType1 and Location, and those appear to be sourced from the element on the website, but spread out across different cells EventType1 = cell(0), EventTime1 = cell(1), and so forth. You don't have a variable for the container name - for your reference, the container name information is located in the cell(4):
VBA Code:
strContainerName = doc.getElementById("grdContainer_DXDataRow0").Cells(4).innerText

You then need to rewrite this line to make sure the info is then written to the worksheet:

VBA Code:
W.Range("C" & intRow) _
          .Resize(, 5).Value = Array(strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2)

I don't know what other data is on the sheet, but I can see that Column H is populated by the notification text on how many containers were found. I would suggest commenting out the following line:
W.Range("H" & intRow) = strFindContainer
and instead put that information in strFiendContainer with the rest of the array:

VBA Code:
W.Range("C" & intRow) _
          .Resize(, 7).Value = Array(strContainerName, strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2, strFindContainer)

Duplicate Entries
Looking at your For ... Next loop, you don't seem to refresh the variables - meaning that you don't delete their values, so if they are not overwritten by the data obtained during the scrape, they will just continue to record the last thing that they scrape.

Does that sound like it might be right?
I updated the Excel file as below:
- For container name: I put container names in Excel and use these names to search.
- For duplicate Entries: Could you please show me how to refresh variables to avoid duplicate entries?
Thanks.
 
Upvote 0
Hi. .Give me a few minutes. I'll have a look now.
 
Upvote 0
Ok. so I think this will work. Try adding the following five lines into your code:

strEventtime1 = ""
strEventtype1 = ""
strLocation1 = ""
strEventtime2 = ""

strEventtype2 = ""

here:

VBA Code:
 W.Range("C" & intRow) _
          .Resize(, 5).Value = Array(strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2)
      End If
    End If
    strEventtime1 = ""
    strEventtype1 = ""
    strLocation1 = ""
    strEventtime2 = ""
    strEventtype2 = ""
        
  Next
Ends:
 
Upvote 0
Ok. so I think this will work. Try adding the following five lines into your code:

strEventtime1 = ""
strEventtype1 = ""
strLocation1 = ""
strEventtime2 = ""

strEventtype2 = ""

here:

VBA Code:
 W.Range("C" & intRow) _
          .Resize(, 5).Value = Array(strEventtime1, strEventtype1, strLocation1, _
                         strEventtime2, strEventtype2)
      End If
    End If
    strEventtime1 = ""
    strEventtype1 = ""
    strLocation1 = ""
    strEventtime2 = ""
    strEventtype2 = ""
       
  Next
Ends:
Oh my god. It works eventually. Thanks so much. I have never thought it turns out to be such an easy method ^_^
 
Upvote 0
You're very welcome. You script works very nicely. I hope it works out well.
 
Upvote 0

Forum statistics

Threads
1,225,487
Messages
6,185,274
Members
453,285
Latest member
Wullay

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top