converting HTML ASCII characters to readable characters

MaccyW

New Member
Joined
Jan 19, 2017
Messages
7
First post though I've lurked a while so apologies if this is a stupid question but after googling and reading plenty I'm still nowhere near a solution given I've a very basic level of using Excel.

I've downloaded a webpage into Excel and using a filter have got some partial HTML code after finding the part of the HTML I want. However within the cell showing the isolated text, it's showing the special character codes rather than the readable characters. Here is an example I've made up that should explain what I mean.

THIS IS A 12" RECORD FROM THE 80's

I need it to say THIS IS A 12" RECORD FROM THE 80's but the " and ' are showing as " and ' within the cell instead!

Is there any quick fix such as a function that allows you to use something like CHAR(34)? Obviously that is just a function for a value whereas this has &# before it and also within more text.

I was just going to create a macro to do a find/replace for certain special characters but surely there's an easier way than that?
 

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
Here is a function that will convert the string in your OP.

VBA Code:
Function DECODEHTML(s As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\&\#\d{3}\;"
    Set matches = .Execute(s)
    For Each mtch In matches
        DECODEHTML = .Replace(s, Chr(Mid(mtch, 3, 3)))
    Next mtch
End With
End Function
 
Upvote 0
Here is a function that will convert the string in your OP.

VBA Code:
Function DECODEHTML(s As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\&\#\d{3}\;"
    Set matches = .Execute(s)
    For Each mtch In matches
        DECODEHTML = .Replace(s, Chr(Mid(mtch, 3, 3)))
    Next mtch
End With
End Function
Thanks for that but you say it will work with the string in my OP? So that means it won't work with any other of the special characters here for example?


A bit more info is that I'm importing the HTML into Excel using a function. It splits it up into 100's of cells in 1 column, column A. I did it this way so I can search for a specific line of HTML without the cell character limit kicking in as I couldn't find any other way of doing it.

When I locate the cell that contains the data I need, I then do a search for part of the text within it and rip it out to a single cell on another worksheet. Sometimes this data has various special characters that replace the likes of % & ' ; ". So that's why I needed to convert the individual cell into a format that is easy to read. If the text was in Cell A1, I was hoping it was something as simple as =HTMLDECODE(A1) but obviously it's not.

I'm after something that converts something like this daft example.

THIS IS A 12" RECORD FROM THE 80's/!©®

into this

THIS IS A 12" RECORD FROM THE 80's/!©®


HTML-Decode.jpg


The macro method I've created using FIND/REPLACE works but it's obviously doing 100's of them searching for any possible special character just for that 1 cell just in case there's one or more special characters in it. I only want it to replace the special characters in the specified cell also, not elsewhere in the file where original or other code may be. Here's part of the macro I've created but it takes a while to go through 100's of find/replace as this is just 5 of them. I've even turned off the autocalculations within the macro to see if it speeds up but it doesn't. I hope that makes it easier to understand

VBA Code:
Sub SPECIAL_CHARACTERS()
Sheets("Sheet1").Select
    Range("A1").Select
    Cells.Replace What:=" ", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:="!", Replacement:="!", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:=""", Replacement:="""", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:="#", Replacement:="#", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:="$", Replacement:="$", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
End Sub
 
Upvote 0
How about this one.

VBA Code:
Function DECODEHTML(s As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\&\#\d{3}\;"
    Set matches = .Execute(s)
    For Each mtch In matches
        s = Replace(s, mtch, Chr(Mid(mtch, 3, 3)))
    Next mtch
End With
DECODEHTML = s
End Function
 
Upvote 0
Another option
VBA Code:
Function MaccyW(txt As String) As String
    Dim sp As Variant, i As Long
    sp = Split(txt, "&")
    For i = 0 To UBound(sp)
        If Left(sp(i), 1) = "#" Then
            sp(i) = Chr(Mid(sp(i), 2, InStr(1, sp(i), ";") - 2)) & Mid(sp(i), InStr(1, sp(i), ";") + 1)
        End If
    Next i
    MaccyW = Join(sp, "")
End Function
Used like
=MaccyW(A1)
 
Upvote 0
Solution
How about this one.

VBA Code:
Function DECODEHTML(s As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\&\#\d{3}\;"
    Set matches = .Execute(s)
    For Each mtch In matches
        s = Replace(s, mtch, Chr(Mid(mtch, 3, 3)))
    Next mtch
End With
DECODEHTML = s
End Function
I'm sorry I'm at a loss now as I can't even get that to work in an example spreadsheet that I'm testing with given my knowledge of VBA. I don't even know which place to put it. I only know what I know through using other code I've found on the internet or creating macros to create the code and getting a basic understanding it in some way and then customising it to my needs.

I got this to work but it's doing the whole worksheet rather than just the one cell I need, cell D12 for example. I've tried modifying it to specify D12 (using row 12, column 4 etc) but just keep getting errors.

VBA Code:
Sub UnescapeCharacters()

    ' set this to match your case
    sheetname = "Sheet1"

    Dim sheet As Worksheet
    Set sheet = Me.Worksheets(sheetname)

    For Row = 1 To sheet.UsedRange.Rows.Count
        For Column = 1 To sheet.UsedRange.Columns.Count
            Dim cell As Range
            Set cell = sheet.Cells(Row, Column)

            ' define all your replacements here
            ReplaceCharacter cell, """, """"
            ReplaceCharacter cell, " ",""
            ReplaceCharacter cell, "!","!"
            ReplaceCharacter cell, ""","""
            ReplaceCharacter cell, "#","#"
            ReplaceCharacter cell, "$","$"
            ReplaceCharacter cell, "%","%"
            ReplaceCharacter cell, "&","&"
            ReplaceCharacter cell, "'","'"
            ReplaceCharacter cell, "(","("
            ReplaceCharacter cell, ")",")"

        Next Column
    Next Row

End Sub

Sub ReplaceCharacter(ByRef cell As Range, ByVal find As String, ByVal replacement As String)

    Dim result As String
    cell.Value = replace(cell.Text, find, replacement, 1, -1)

End Sub
 
Upvote 0
Another option
VBA Code:
Function MaccyW(txt As String) As String
    Dim sp As Variant, i As Long
    sp = Split(txt, "&")
    For i = 0 To UBound(sp)
        If Left(sp(i), 1) = "#" Then
            sp(i) = Chr(Mid(sp(i), 2, InStr(1, sp(i), ";") - 2)) & Mid(sp(i), InStr(1, sp(i), ";") + 1)
        End If
    Next i
    MaccyW = Join(sp, "")
End Function
Used like
=MaccyW(A1)
Thanks ever so much, I used that and got it to work and I can even understand how it works with regards creating the Function :cool:

How about this one.

VBA Code:
Function DECODEHTML(s As String) As String
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\&\#\d{3}\;"
    Set matches = .Execute(s)
    For Each mtch In matches
        s = Replace(s, mtch, Chr(Mid(mtch, 3, 3)))
    Next mtch
End With
DECODEHTML = s
End Function
I was missing putting =DECODEHTML(A1) as I didn't realise that I needed to put that in but @Fluff prompted me the baby step! Obviously now I see how it works with the function as I have really used them much in VBA, it's normally macros using the Sub stuff!
 
Upvote 0
Glad we could help & thanks for the feedback
 
Upvote 0
Glad we could help & thanks for the feedback
It's been something I've been wanting to do for a few years now. I just used to manually replace the odd one but now another file needs to check many of them so it needed automated. I'm really appreciate the help from you both plus I've obviously I've learned a bit more about the FUNCTION :cool:

I'm sure I'll be back but managing so far modifying and updating my files. There's probably many easier and faster ways to do what I want to do overall but it's probably better I just plod along myself and see how it goes :p
 
Upvote 0
That's the best way to go & how I started. :)
Fast code & clever code are (to some extent) pointless if you don't understand them. If you understand something, you have a better chance of modifying it in the future if needed.
 
Upvote 0

Forum statistics

Threads
1,223,723
Messages
6,174,123
Members
452,546
Latest member
Rafafa

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top