Translating UTF-8 to legible Latin text

Rhodie72

Well-known Member
Joined
Apr 18, 2016
Messages
633
Office Version
  1. 365
  2. 2021
  3. 2019
  4. 2016
  5. 2013
  6. 2010
  7. 2007
  8. 2003 or older
Platform
  1. Windows
  2. Mobile
I have data that is encoded in utf-8 that appears as webdings but cannot be read. It is english text for sample payslips. The copied data looks like this:

  
What it actually says is:

DIAMOND BUS LIMITED

What I would like to do is translate it into legible latin text using VBA as the Excel UTF-8 import does NOT correctly do the job at all. I know for a fact a solution to this will help thousands of people globally.
Whilst Excel can convert the orignal text into binary numbers, it is simply a mathematical case of replacing the UTF-8 characters with Latin charaters from standard code pages

My idea is to convert all the characters into binary

i BEGAN A TABLE FOR TRANSLATION BUT vba IS THE WAY FORWARD. Here's what I have done so far:

Excel Formula:
Sub Translate()
    For i = 1 To Len(ActiveCell.Value)
        answer = Application.WorksheetFunction.Unicode(Mid(ActiveCell.Value, i, 1))
        MsgBox answer
    Next i
End Sub

Putting the resultant info into a dynamic array and then converting the resultant into letters would be excellent.

Please help?

In Excel it looks like this:
1686751883765.png
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
Some questions come to mind.
  1. What is the format of the data file you are importing?
  2. How are you getting it into Excel?
  3. Are you sure it is UTF-8, and not UTF-16?
  4. Do you have a text editor like Notepad++ (free download) that will let you inspect the raw datafile to acertain the encoding?)
1686766377771.png
 
Upvote 0
The data is being copied and pasted into the spreadsheet from a pdf file. Iknow exactly what the encoding is as stated. If you are unsure you can copy the text and see for yourself using online tools. NOTEPAD++ doesnt translate it correctly either! IF had then it would have been much easier; but that is not the point. I have since created a translation sheet that shows the different characters butnow I need to create a routine that will replace the characters with appropriate text. I still think that creating an array would be the best way forward for this
 
Upvote 0
Was just playing with the below but could do with more examples of text, you can have a play yourself:
VBA Code:
Function ConvTxt(iTxt As String)
    Dim oTxt As String, ch As String, i As Long
    
    For i = 1 To Len(iTxt)
        ch = Mid(iTxt, i, 1)
        If ch = " " Then
            oTxt = oTxt & " "
        Else
            oTxt = oTxt & Chr(Application.Unicode(ch) - 61440)
        End If
    Next i
    ConvTxt = oTxt
End Function

Maybe you could upload some examples using XL2BB?
 
Upvote 0
Or a formula doing the same:
Book1
A
1  
2DIAMOND BUS LIMITED
Sheet1
Cell Formulas
RangeFormula
A2A2=LET( c,IFERROR(MID(A1, SEQUENCE(LEN(A1)), 1), ""), TEXTJOIN(,,IFERROR(CHAR(UNICODE(c)-61440)," ")))
 
Upvote 0
Perhaps something like this.
VBA Code:
Sub UniConvert()
    Dim CA As Variant, RRow As Long, I As Long
    Dim CellRange As Range
    Dim WS As Worksheet
    Dim S As String
    
    Set WS = ActiveSheet
    
    With WS
        Set CellRange = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
End Sub
 
Upvote 1
Was just playing with the below but could do with more examples of text, you can have a play yourself:
VBA Code:
Function ConvTxt(iTxt As String)
    Dim oTxt As String, ch As String, i As Long
   
    For i = 1 To Len(iTxt)
        ch = Mid(iTxt, i, 1)
        If ch = " " Then
            oTxt = oTxt & " "
        Else
            oTxt = oTxt & Chr(Application.Unicode(ch) - 61440)
        End If
    Next i
    ConvTxt = oTxt
End Function

Maybe you could upload some examples using XL2BB?
Oddly, when I paste the data to this website it becomes legible. Some of the information is sensitive, so really it is a code page translation that I need from UTF which has over 65k characters versus the 128+ of Western European language CODE PAGE. I'll give this a go and see what comes of it. Thanks.
 
Upvote 0
Was just playing with the below but could do with more examples of text, you can have a play yourself:
VBA Code:
Function ConvTxt(iTxt As String)
    Dim oTxt As String, ch As String, i As Long
   
    For i = 1 To Len(iTxt)
        ch = Mid(iTxt, i, 1)
        If ch = " " Then
            oTxt = oTxt & " "
        Else
            oTxt = oTxt & Chr(Application.Unicode(ch) - 61440)
        End If
    Next i
    ConvTxt = oTxt
End Function

Maybe you could upload some examples using XL2BB?
Yeah, tried that, nothing worked.
 
Upvote 0
Perhaps something like this.
VBA Code:
Sub UniConvert()
    Dim CA As Variant, RRow As Long, I As Long
    Dim CellRange As Range
    Dim WS As Worksheet
    Dim S As String
   
    Set WS = ActiveSheet
   
    With WS
        Set CellRange = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With
   
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
End Sub
Perfect solutiion, well done and thank you! Now I just need to do minor changes for formatting and tables. Saved me hours.
 
Upvote 0
Perhaps something like this.
VBA Code:
Sub UniConvert()
    Dim CA As Variant, RRow As Long, I As Long
    Dim CellRange As Range
    Dim WS As Worksheet
    Dim S As String
   
    Set WS = ActiveSheet
   
    With WS
        Set CellRange = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With
   
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
End Sub
Used your code and expanded it only to discover there is an error generated when there is only a single line of text on subsequent columns. I guess limiting the mber of rows was $actually$ the money shot. So how would you modify it to use the range A1:G33 only? I just smacked this in and ran it with modification to ranges to kill off the error problem.

VBA Code:
Sub Translate()
    Dim CA As Variant, RRow As Long, I As Long
    Dim CellRange As Range
    Dim WS As Worksheet
    Dim S As String
    
    Set WS = ActiveSheet
    
    With WS
        Set CellRange = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
        With WS
        Set CellRange = .Range("B1:B" & .Range("B" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
    With WS
        Set CellRange = .Range("C1:C" & .Range("C" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
    With WS
        Set CellRange = .Range("D1:D" & .Range("D" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
    With WS
        Set CellRange = .Range("E1:E" & .Range("E" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
    With WS
        Set CellRange = .Range("F1:F33" & .Range("F" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA
    With WS
        Set CellRange = .Range("G1:G33" & .Range("G" & .Rows.Count).End(xlUp).Row)
    End With
    
    CA = CellRange.Value
    For RRow = LBound(CA, 1) To UBound(CA, 1)
        S = ""
        For I = 1 To Len(CA(RRow, 1))
            S = S & Chr(Application.Unicode(Mid(CA(RRow, 1), I, 1)) And 255)
        Next I
        CA(RRow, 1) = S
    Next RRow
    CellRange.Value = CA

End Sub
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,240
Members
452,621
Latest member
Laura_PinksBTHFT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top