How to Edit lines in a .txt Web page when downloading to a .txt file

AMRICE01

New Member
Joined
Oct 23, 2019
Messages
3
I am using a data scrapping app to extract data from a .txt file that has been downloaded from a .txt web page. The .txt web page has lots of "&" which shows as "amp;". I would like to edit the web page during download using VBA to replace the "amp;" to "&" and shift the whole line to the left 3 spaces to adjust the text line to the right of the "&" to the original position.

Current partial VBA code is below:
...
CreateObject("Scripting.FileSystemObject").CreateTextFile XName, True
'
FF = FreeFile
Open XName For Output As #FF
With IE.document.body
DoEvents
On Error GoTo UUU
Print #FF, .OuterHtml & .innerHTML
End With
Close #FF
...

Any help would be greatly appreciated

Thanks
Al
 

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
First add the following function to your module. This function will decode HTML entities, such as "&amp", "&quot", "&lt", etc. You can add or remove entities as needed.

VBA Code:
Public Function HTMLEntititesDecode(p_strText As String) As String

    Dim strTemp As String
    strTemp = p_strText
    
    strTemp = Replace(strTemp, """, """")
    strTemp = Replace(strTemp, "&", "&")
    strTemp = Replace(strTemp, "'", "'")
    strTemp = Replace(strTemp, "&lt;", "<")
    strTemp = Replace(strTemp, "&gt;", ">")
    strTemp = Replace(strTemp, "&nbsp;", " ")
    strTemp = Replace(strTemp, "&iexcl;", "¡")
    strTemp = Replace(strTemp, "&cent;", "¢")
    strTemp = Replace(strTemp, "&pound;", "£")
    strTemp = Replace(strTemp, "&curren;", "?")
    strTemp = Replace(strTemp, "&yen;", "¥")
    strTemp = Replace(strTemp, "&brvbar;", "?")
    strTemp = Replace(strTemp, "&sect;", "§")
    strTemp = Replace(strTemp, "&uml;", "¨")
    strTemp = Replace(strTemp, "&copy;", "©")
    strTemp = Replace(strTemp, "&ordf;", "ª")
    strTemp = Replace(strTemp, "&laquo;", "«")
    strTemp = Replace(strTemp, "&not;", "¬")
    strTemp = Replace(strTemp, "*", "")
    strTemp = Replace(strTemp, "&reg;", "®")
    strTemp = Replace(strTemp, "&macr;", "¯")
    strTemp = Replace(strTemp, "&deg;", "°")
    strTemp = Replace(strTemp, "&plusmn;", "±")
    strTemp = Replace(strTemp, "&sup2;", "?")
    strTemp = Replace(strTemp, "&sup3;", "?")
    strTemp = Replace(strTemp, "&acute;", "´")
    strTemp = Replace(strTemp, "&micro;", "µ")
    strTemp = Replace(strTemp, "&para;", "¶")
    strTemp = Replace(strTemp, "&middot;", "·")
    strTemp = Replace(strTemp, "&cedil;", "¸")
    strTemp = Replace(strTemp, "&sup1;", "?")
    strTemp = Replace(strTemp, "&ordm;", "º")
    strTemp = Replace(strTemp, "&raquo;", "»")
    strTemp = Replace(strTemp, "&frac14;", "?")
    strTemp = Replace(strTemp, "&frac12;", "?")
    strTemp = Replace(strTemp, "&frac34;", "?")
    strTemp = Replace(strTemp, "&iquest;", "¿")
    strTemp = Replace(strTemp, "&Agrave;", "À")
    strTemp = Replace(strTemp, "&Aacute;", "Á")
    strTemp = Replace(strTemp, "&Acirc;", "Â")
    strTemp = Replace(strTemp, "&Atilde;", "Ã")
    strTemp = Replace(strTemp, "&Auml;", "Ä")
    strTemp = Replace(strTemp, "&Aring;", "Å")
    strTemp = Replace(strTemp, "&AElig;", "Æ")
    strTemp = Replace(strTemp, "&Ccedil;", "Ç")
    strTemp = Replace(strTemp, "&Egrave;", "È")
    strTemp = Replace(strTemp, "&Eacute;", "É")
    strTemp = Replace(strTemp, "&Ecirc;", "Ê")
    strTemp = Replace(strTemp, "&Euml;", "Ë")
    strTemp = Replace(strTemp, "&Igrave;", "Ì")
    strTemp = Replace(strTemp, "&Iacute;", "Í")
    strTemp = Replace(strTemp, "&Icirc;", "Î")
    strTemp = Replace(strTemp, "&Iuml;", "Ï")
    strTemp = Replace(strTemp, "&ETH;", "?")
    strTemp = Replace(strTemp, "&Ntilde;", "Ñ")
    strTemp = Replace(strTemp, "&Ograve;", "Ò")
    strTemp = Replace(strTemp, "&Oacute;", "Ó")
    strTemp = Replace(strTemp, "&Ocirc;", "Ô")
    strTemp = Replace(strTemp, "&Otilde;", "Õ")
    strTemp = Replace(strTemp, "&Ouml;", "Ö")
    strTemp = Replace(strTemp, "&times;", "?")
    strTemp = Replace(strTemp, "&Oslash;", "Ø")
    strTemp = Replace(strTemp, "&Ugrave;", "Ù")
    strTemp = Replace(strTemp, "&Uacute;", "Ú")
    strTemp = Replace(strTemp, "&Ucirc;", "Û")
    strTemp = Replace(strTemp, "&Uuml;", "Ü")
    strTemp = Replace(strTemp, "&Yacute;", "?")
    strTemp = Replace(strTemp, "&THORN;", "?")
    strTemp = Replace(strTemp, "&szlig;", "ß")
    strTemp = Replace(strTemp, "&agrave;", "à")
    strTemp = Replace(strTemp, "&aacute;", "á")
    strTemp = Replace(strTemp, "&acirc;", "â")
    strTemp = Replace(strTemp, "&atilde;", "ã")
    strTemp = Replace(strTemp, "&auml;", "ä")
    strTemp = Replace(strTemp, "&aring;", "å")
    strTemp = Replace(strTemp, "&aelig;", "æ")
    strTemp = Replace(strTemp, "&ccedil;", "ç")
    strTemp = Replace(strTemp, "&egrave;", "è")
    strTemp = Replace(strTemp, "&eacute;", "é")
    strTemp = Replace(strTemp, "&ecirc;", "ê")
    strTemp = Replace(strTemp, "&euml;", "ë")
    strTemp = Replace(strTemp, "&igrave;", "ì")
    strTemp = Replace(strTemp, "&iacute;", "í")
    strTemp = Replace(strTemp, "&icirc;", "î")
    strTemp = Replace(strTemp, "&iuml;", "ï")
    strTemp = Replace(strTemp, "&eth;", "?")
    strTemp = Replace(strTemp, "&ntilde;", "ñ")
    strTemp = Replace(strTemp, "&ograve;", "ò")
    strTemp = Replace(strTemp, "&oacute;", "ó")
    strTemp = Replace(strTemp, "&ocirc;", "ô")
    strTemp = Replace(strTemp, "&otilde;", "õ")
    strTemp = Replace(strTemp, "&ouml;", "ö")
    strTemp = Replace(strTemp, "&divide;", "÷")
    strTemp = Replace(strTemp, "&oslash;", "ø")
    strTemp = Replace(strTemp, "&ugrave;", "ù")
    strTemp = Replace(strTemp, "&uacute;", "ú")
    strTemp = Replace(strTemp, "&ucirc;", "û")
    strTemp = Replace(strTemp, "&uuml;", "ü")
    strTemp = Replace(strTemp, "&yacute;", "?")
    strTemp = Replace(strTemp, "&thorn;", "?")
    strTemp = Replace(strTemp, "&yuml;", "ÿ")
    strTemp = Replace(strTemp, "&OElig;", "Œ")
    strTemp = Replace(strTemp, "&oelig;", "œ")
    strTemp = Replace(strTemp, "&Scaron;", "?")
    strTemp = Replace(strTemp, "&scaron;", "?")
    strTemp = Replace(strTemp, "&Yuml;", "Ÿ")
    strTemp = Replace(strTemp, "&fnof;", "ƒ")
    strTemp = Replace(strTemp, "&circ;", "ˆ")
    strTemp = Replace(strTemp, "&tilde;", "˜")
    strTemp = Replace(strTemp, "&thinsp;", "")
    strTemp = Replace(strTemp, "&zwnj;", "")
    strTemp = Replace(strTemp, "&zwj;", "")
    strTemp = Replace(strTemp, "&lrm;", "")
    strTemp = Replace(strTemp, "&rlm;", "")
    strTemp = Replace(strTemp, "&ndash;", "–")
    strTemp = Replace(strTemp, "&mdash;", "—")
    strTemp = Replace(strTemp, "&lsquo;", "‘")
    strTemp = Replace(strTemp, "&rsquo;", "’")
    strTemp = Replace(strTemp, "&sbquo;", "‚")
    strTemp = Replace(strTemp, "&ldquo;", "“")
    strTemp = Replace(strTemp, "&rdquo;", "”")
    strTemp = Replace(strTemp, "&bdquo;", "„")
    strTemp = Replace(strTemp, "&dagger;", "†")
    strTemp = Replace(strTemp, "&Dagger;", "‡")
    strTemp = Replace(strTemp, "&bull;", "•")
    strTemp = Replace(strTemp, "&hellip;", "…")
    strTemp = Replace(strTemp, "&permil;", "‰")
    strTemp = Replace(strTemp, "&lsaquo;", "‹")
    strTemp = Replace(strTemp, "&rsaquo;", "›")
    strTemp = Replace(strTemp, "&euro;", "€")
    strTemp = Replace(strTemp, "&trade;", "™")
    
    HTMLEntititesDecode = strTemp
    
End Function

Then try the following code...

VBA Code:
    Dim strHTML As String
    With IE.document.body
        strHTML = .OuterHtml & .innerHTML
    End With
    
    Dim strDecodedHTML As String
    strDecodedHTML = HTMLEntititesDecode(strHTML)
    
    FF = FreeFile
    Open XName For Output As #FF
        Print #FF, strDecodedHTML
    Close #FF

Hope this helps!
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,244
Members
452,622
Latest member
Laura_PinksBTHFT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top