Function that shows the Unicode code of a character

Jubinell · Oct 15, 2009

Does such function exist in Excel?

pgc01 · Nov 9, 2009

Sorry Vladimir, I was out for some days.

I understand your point, and accept it. I have, however, a different opinion on this subject.

Having look on VBA help of AscW() function we can find that:
The AscW function returns the Unicode character code …

I can't argue with that. If Microsoft uses it, you certainly can use it too.

All I can say is that I don't agree with it. From what I see, vba strings are simply strings with 16bit characters. If your vba string is to be interpreted with the Unicode standard then it is a Unicode string. The same vba string could, however, be interpreted using any other standard, for ex., a 16bit standard for a far east language.

To me, to say that a string is a Unicode string just because its characters have 16 bits is like saying that a string is a ANSI string just because its characters have 8 bits.

ZVI · Nov 9, 2009

Agree that we've faced here with terminology contradiction.

But in my opinion, to solve such contradiction we can consider that Unicode symbol is unequivocally identified by its main properties:
1. Assignment of Unicode symbol
2. Shape(s) of Unicode symbol
3. Character, identified by the code (2 or more bytes)

In this case the Unicode character is just one of the properties of the Unicode symbol.
Using Unicode character property we should remember that in case of not all properties are taken into account then it is not the Unicode symbol - just its property.

Regards,
Vladimir

pgc01 · Nov 9, 2009

Wow, I see this seems to be one of those never ending discussions

. One of these days we'll have to put a stop to this.

Anyway not even your point 3 is respected.

You can do a simple test. Let's use the character that Jubinell posted, "&#55360;&#56738;". This is a Unicode string with 1 only Unicode character. This is one of the Unicode characters defined with a surrogate pair, in this case the pair D840 DDA2.

Now copy this character to the cell A1

We have now in cell A1 a Unicode string with 1 character.

Let's test it (using the immediate window).

Test 1 - the length

?Len(Range("A1").Value)
2

Now you say that vba strings are Unicode strings and, as you say in your point 3, Unicode characters may have 2 or more bytes. As you can see vba does not recognise it.

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we get the answer 2, it's OK

Test 2 - the code of the Unicode character

? Hex(AscW(Range("A1").Value))
D840

You say that vba strings are Unicode strings. Where is the rest of the code of this character? As you can see vba does not recognise it.

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we get the code of the first one, as expected since when you use Asc or AscW with a string with several characters you get the code of the first one. It's OK.

Test 3 - Let's try Mid() to get a character that does not exist(?)

? Hex(AscW(Mid(Range("A1").Value, 2, 1)))
DDA2

You say that vba strings are Unicode strings. How is it possible to use Mid to get the second Unicode character of a string that has only 1?

I say that vba strings are just sequences of 16bit codes. Since we have in this case 2 16bit codes we can use Mid to get the second 16bit code, and get it's value.

Test 4 - Let's try a test in the worksheet

In B1: =Len(A1) -> result: 2

Not even in the worksheet the string is recognised as being a Unicode string with 1 character.

My conclusion: a vba string is just a sequence of characters encoded with 16 bits.

ZVI · Nov 12, 2009

pgc01 said:
Wow, I see this seems to be one of those never ending discussions . One of these days we'll have to put a stop to this.
...
My conclusion: a VBA string is just a sequence of characters encoded with 16 bits.

Hi Pedro,
Ok, let it be as is

And thank you very much for the growing of my knowledge of Unicode standard.

Jubinell,
Hope that it is not too late - see more details below.

To insert Unicode symbols you can use Microsoft Global Input Method Editors (IMEs).
There is also IME for Office XP which I've successfully tested with Excel 2003 as well (but Excel 2002 XP is installed on my PC).

To insert into a cell the 1st 2-Bytes Unicode symbol 人 , mentioned in your post#11, use Excel menu:
Insert / Symbol / Font = Arial Unicode MS / Character code = 4EBA / from = Unicode(hex) / Insert button

For the 2nd 2-Bytes Unicode symbol 亻 do the same as above but with Character code = 4EBB

For the 3d 4-Bytes symbol &#55360;&#56738;:
1. Apply Arial Unicode MS font to the cell
2. Select 2 positions of the 3d symbol in the table, mentioned in your post#11
3. Copy & paste the selection into Excel cell - it will be shown as the single character in a cell.

There are additional user defined functions to convert Hex code into Unicode symbol and vice versa.
Two and four bytes Unicode symbols are supported.
You can pass to the Hex2Uni(HexCode) function the Hex code of the Unicode Lookup table (refer to your post#4).
Prefix "0x" of the Hex code is ignored by this function – you can copy Hex code from Unicode Lookup table with that prefix.

Rich (BB code):


Function Uni2Hex(Txt As String) As String
  Dim b() As Byte, i&, j&
  b() = Trim(Txt)
  j = UBound(b)
  For i = 0 To j Step 2
    If i < j Then Uni2Hex = Uni2Hex & Format(Hex(b(i + 1)), "00")
    Uni2Hex = Uni2Hex & Format(Hex(b(i)), "00")
  Next
End Function

Function Hex2Uni(HexCode As String) As String
  Dim b() As Byte, i&, j&, s$
  s = Replace(HexCode, " ", "")
  s = Replace(s, "0x", "")
  j = Len(s)
  If j <= 4 Then
    ReDim b(1 To 4)
    b() = ChrW("&H" & s)
  Else
    ReDim b(1 To 8)
    b() = ChrW("&H" & Mid$(s, 1, j - 4)) & ChrW("&H" & Mid$(s, j - 3))
  End If
  Hex2Uni = b()
End Function

Regards,
Vladimir

Jubinell · Nov 12, 2009

Thank you for the help Vlad. I've moved past this and ran into other hair-pulling problems, but I will go back and implement your methods when I have time.

Thank you to both of you for offering interesting insights to this problem.

Regarding Unicode...well I've always thought that it's a good thing to have a Universal standard. But the more I deal with it in the Japanese context, the more I sense that it's more like a cut-and-paste solution to a problem that requires more of a brand new design. Reading more about its developmenet history leads me to think that it's more of an attempt of the American business interests to hastily mend the cultural/linguistic gap between the West of the rest of the world, in order to sell more products of the former without regards to the cultural integrity of the latter. The ones who suffer the most are the countries with the more complex script but less powerful political standing to make sure the representation thereof is dignified enough by their standard.

That is, yes, off the topic. So please excuse me.

pgc01 · Nov 13, 2009

Hi Vladimir

Thank you. I'm not a specialist in Unicode and I learned a lot from our exchange. I did not know about the IME's, I'll check your links.

Jubinell

I've read other opinions expressing that Unicode has not an adequate design, as far as some languages that use ideographic characters are concerned. Let's hope that in time a future standard (for ex. a 32bit standard ) will emerge with a more structured and adequate design of the character map.

earthasa · May 22, 2012

I know this is an old post, but it's highlighted on Google, so I'd like to point out that Vladimir's version has an oversight: Format() can only format numbers in decimal format. So Hex2Uni will return incorrect results for bytes from A-F, as Format will see them as text instead of numbers and not pad them. Unicode 0A0A will come full circle as AA (=00AA), for example. Also, the use of Trim() in Uni2Hex prevents the function from returning the Unicode value for space.

Here are versions that will similarly handle surrogate pairs; in fact, I just have them convert the entire string to a hex sequence and vice-versa:

Code:

Function Uni2Hex(Txt As String) As String
Dim n As Long
    For n = 1 To Len(Txt)
        Uni2Hex = Uni2Hex & Right("000" & Hex(AscW(Mid(Txt, n, 1))), 4)
    Next n
End Function

Function Hex2Uni(ByVal Txt As String) As String
Dim n As Long, l As Long
    Txt = Replace(Replace(Replace(LCase(Txt), "0x", ""), "&h", ""), " ", "") 'optional clean-up
    l = WorksheetFunction.Ceiling(Len(Txt), 4)
    If l > 0 Then
        Txt = Right("000" & Txt, l)
        For n = 1 To l Step 4
            Hex2Uni = Hex2Uni & ChrW(CInt("&h" & Mid(Txt, n, 4)))
        Next n
    End If
End Function

Jubinell · May 23, 2012

Thanks for the adjustments. This helps!

Ikester · Oct 8, 2014

I've been having some regex difficulties WRT Unicode characters and while I haven't quite solved my problem (will post new thread if I can't solve it), this thread has been most informative. Thanks to all!

Just one quibble:

earthasa said:
I know this is an old post, but it's highlighted on Google, so I'd like to point out that Vladimir's version has an oversight: Format() can only format numbers in decimal format. So Hex2Uni will return incorrect results for bytes from A-F, as Format will see them as text instead of numbers and not pad them. Unicode 0A0A will come full circle as AA (=00AA), for example. Also, the use of Trim() in Uni2Hex prevents the function from returning the Unicode value for space.

Here are versions that will similarly handle surrogate pairs; in fact, I just have them convert the entire string to a hex sequence and vice-versa:

Rich (BB code):

Function Uni2Hex(Txt As String) As String Dim n As Long For n = 1 To Len(Txt) Uni2Hex = Uni2Hex & Right("000" & Hex(AscW(Mid(Txt, n, 1))), 4) Next n End Function Function Hex2Uni(ByVal Txt As String) As String Dim n As Long, l As Long Txt = Replace(Replace(Replace(LCase(Txt), "0x", ""), "&h", ""), " ", "") 'optional clean-up l = Application.WorksheetFunction.Ceiling(Len(Txt), 4) If l > 0 Then Txt = Right("000" & Txt, l) For n = 1 To l Step 4 Hex2Uni = Hex2Uni & ChrW(CInt("&h" & Mid(Txt, n, 4))) Next n End If End Function

I found the same thing WRT to the earlier version of a *Uni2Hex* function using a format(n,"00") structure and worked out a slight change to use the RIGHT() function however like this approach better (seems more straightforward). One problem with the Hex2Uni function above though; the Ceiling() function only works if preceded by "Application." (as marked in red above).

vsrawat · Nov 30, 2015

Happened to visit this thread found through Google.

Just dropping a line to thank you all for this wonderful thread and discussions.

I found all my answers to the questions I was puzzled with Unicode codes and characters in Excel 2010.

Just to add, since then MS has added functions in excel 2013, 2016
=UNICHAR(66) decimal
=UNICODE("B")

to do the same. However, as I am still on excel 2010, so I found your vba functions very useful.

Thanks to you all.
--
Rawat

Function that shows the Unicode code of a character

Jubinell

Board Regular

pgc01

MrExcel MVP

ZVI

MrExcel MVP

pgc01

MrExcel MVP

ZVI

MrExcel MVP

Jubinell

Board Regular

pgc01

MrExcel MVP

earthasa

New Member

Jubinell

Board Regular

Ikester

New Member

vsrawat

New Member

Similar threads

Share this page

Function that shows the Unicode code of a character

Board Regular

MrExcel MVP

MrExcel MVP

MrExcel MVP

MrExcel MVP

Board Regular

MrExcel MVP

New Member

Board Regular

New Member

New Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock