somnath_it2006
Well-known Member
- Joined
- Apr 11, 2009
- Messages
- 574
Hi All,
From last few days I was working on website crawlers (Sorry I don't know the exact word).
By using some references I used IE control for extract data from website as per form submitting and etc..
And this is working very effectively in Data Extraction.
Here I am just sharing that methods...
References Needs to add
1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary
Declarations
Navigate URL in IE
Extract All the hyperlinks from Webpage
Get HTML Source data of Webpage
Extract values from form control or assign values to Form control
Get all the form controls from webpage
Copy whole webpage on excel sheet
For more references please check this useful links-
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-ba...web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244
If you know something more please add to this thread...
From last few days I was working on website crawlers (Sorry I don't know the exact word).
By using some references I used IE control for extract data from website as per form submitting and etc..
And this is working very effectively in Data Extraction.
Here I am just sharing that methods...
References Needs to add
1. Microsoft Internet Control
2. Microsoft HTML Object Liabrary
Declarations
Code:
Dim ie As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim HTMLlinks As HTMLAnchorElement
Navigate URL in IE
Code:
' Navigate the web
ie.Navigate "https://www.xyz.com"
ie.Visible = True
'Loop unitl ie page is fully loaded
Do Until ie.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Do Until ie.Document.ReadyState = "complete"
DoEvents
Loop
Extract All the hyperlinks from Webpage
Code:
' Set document object
Set HTMLdoc = ie.Document
' Loop thru each link
For Each HTMLlinks In HTMLdoc.Links
' Hyperlink
MsgBox HTMLlinks.href
' For click on Hyperlink
HTMLlinks.Click
' Check all the propartiese of "HTMLlinks"
Next HTMLlinks
Get HTML Source data of Webpage
Code:
' Many times you were no able to see the HTML source data
' In this case you can extract using innerHTML object
' Text file created using Scripting Runtime refrence
Dim EHTML
' Set object
Set EHTML = ie.Document.all.Item
Dim FSO As FileSystemObject
Dim FSOFile As TextStream
Set FSO = New FileSystemObject
Set FSOFile = FSO.OpenTextFile("c:\WriteTest.txt", 2, True)
FSOFile.Write EHTML.innerHTML
FSOFile.Close
Extract values from form control or assign values to Form control
Code:
' Text box
ie.Document.all.Item("cmpnyID").Value = Range("A1").Value
' or
ie.Document.getElementById("cmpnyID").Value = Range("A1").Value
' Click on button
ie.Document.getElementById("Default").Click
' You can get this "Element Id" or "Item ID" from source code
Get all the form controls from webpage
Code:
'loop forms
For x = 0 To ie.Document.forms - 1
StrString = ie.Document.Text
'loop items in form
For i = 0 To ie.Document.forms(x).Length - 1
' name and type of the item
MsgBox ie.Document.forms(x)(i).Name & " " & ie.Document.forms(x)(i).Type
'if it is a submit button, click it
If ie.Document.forms(x)(i).Type = "submit" Then
ie.Document.forms(x)(i).Click
End If
Next i
Next x
Copy whole webpage on excel sheet
Code:
ie.ExecWB 17, 0
ie.ExecWB 12, 2
ActiveSheet.PasteSpecial Format:="HTML", link:=False, DisplayAsIcon:=False
Application.CutCopyMode = False
For more references please check this useful links-
http://www.tushar-mehta.com/publish_train/xl_vba_cases/vba_web_pages_services/index.htm
http://www.xtremevbtalk.com/showthread.php?t=79552
http://www.vbforums.com/showthread.php?t=406671
http://www.andreavb.com/forum/viewtopic_6988.html
http://forums.devshed.com/visual-ba...web-browser-control-tips-and-more-345993.html
http://www.mrexcel.com/forum/showthread.php?t=449840
http://www.mrexcel.com/forum/showthread.php?p=2240244#post2240244
If you know something more please add to this thread...
Last edited: