Function that shows the Unicode code of a character

I realize this is problematic since I cannot even see those characters which I've posted (the rare ones). I'm using Unicode (UTF-8) with Firefox 3.5.

I'll try something else. Please take a look at this link:
http://ja.wikipedia.org/wiki/人部

If you look on the right, at the top of the bushu table there are 3 characters in big, bold type. If you compare, the first 2 are the two I posted at the bottom of my previous message. The third and final character is the final character I'm having problems with. When I run CodeUni() on this character I get D840, but when I run CharUni on D840 I get some weird characters (it's like a box with a question mark inside), that I'm sure means error instead of an actual character.

Any ideas?
 
Upvote 0

Excel Facts

What did Pito Salas invent?
Pito Salas, working for Lotus, popularized what would become to be pivot tables. It was released as Lotus Improv in 1989.
Hi Jubinell

As you say this is a rare character and raises a special case in UNICODE. This is something that happens for some (many) ideographic characters.

As you know unicode is a 16 bit code character set, which gives you a maximum of 64K characters. Although this is enough for the most used characters it's not enough for all the characters that you may use, as languages that use ideographic characters may have tens of thousands of characters.

This is why in Unicode you have surrogate pairs. This is a pair of 16 bit codes that define one only character.

For these special cases these characters have the first (high) code in the range D800-DBFF and the second (low) in the range DC00-DFFF. This gives you an extra 1M characters.

When you use the excel string functions, excel considers the characters all as 16bit. This means that if you have the character you posted, ��, alone in A1, if you try Len(A1) you'll get 2, because this character takes 2 16bit characters.

Ex. consider in A1 the string "1��亻人��A" (sorry if the characters together make no sense, I don't know them)

Len(A1) = 8, although you only see 6 characters

Now use Mid(A1,Position,1) with position 1-8 to get the codes with CodeUni, the codes are:

0031, D840, DDA2, 4EBB, 4EBA, D852, DD14, 0041

This explains it, 2 of the characters are defined with surrogate pairs. This makes the code sequence clearer

0031, (D840, DDA2), 4EBB, 4EBA, (D852, DD14), 0041

Conclusion

When you have a character for which CodeUni gives you a code in the range D800-DBFF, you know this is a character that is encoded with a surrogate pair and you must read the next 16bit code to get the the lower part of the code. It's the 2 16bit codes together that gives you the whole character.

For the character "��" you posted in A1 use CodeUni(Mid(A1,1,1)) and CodeUni(Mid(A1,2,1)) to get the surrogate pair (D840, DDA2)

To display the charater use CharUNI("D840")&CharUni("DDA2").


Remark: I don't know which sets of these extra characters each windows/excel version supports.
 
Upvote 0
PGC,

As complicated as that might sound, I verified what you said step and step and got expected results. The stacked charuni functions also returned to corrected "surrogated" unicode character.

One question I have though. Looking at Charuni()&CharUni() didn't seem intuitive that the result would be a single displayable character. I thought it would be something like x & y, where x and y are some resulting unicode characters, and the final answer would just be xy, a double-character.

I know that you told me the character I was asking about is indeed two characters. But I guess it's just not clear to you how Excel figure out that x simply put next to y above would return that character, no further modifications required..
 
Upvote 0
I'm glad it worked.

Just think that a string in excel, for purposes of storing or manipulating, is a sequence of 16bit codes. When you manipulate the string and you access a character with one of the string functions, you are, in fact, accessing 1 of these codes at a time. When you concatenate the 2 results of the CharUni you are writing the sequence corresponding to the surrogate pair.

When the string is displayed, the process is smarter, excel doesn't just see a character as a 16 bit code, it uses the 16bit codes to build the Unicode string, taking into account the case of the surrogate pairs.

HTH
 
Last edited:
Upvote 0
hmm...I guess I'll leave the display logic for the computers to figure out.
Final question, the character (D840, DDA2) that we're talking about...well I am still not able to view it in Firefox, in both your post and mine. Any ideas why and how to fix it?

I'm viewing through Unicode (UTF-8) with Firefox 3.5 Switching to other Unicodes give me bizarre results.
 
Upvote 0
Just the short remark: there are different visual shapes of the same Unicode symbol for the different fonts. Type any symbol(s) into cell and firstly apply Arial font, after that apply Wingdings font to see the differences. Some fonts can be absent/aliasing in the operating system as well.
 
Upvote 0
Jubinell

Sorry, I don't usually use Firefox and never use ideographic characters. In IE they display OK. I just tried with Firefox and Chrome and they don't display. Since this is not related to excel maybe you can get answers in some Firefox forum?

Vladimir

Why do you say that Wingdings is a Unicode font?

The Unicode symbols are as defined by the Unicode Standard, developed by the members of the Unicode Consortium.
 
Upvote 0
Jubinell
...
Vladimir

Why do you say that Wingdings is a Unicode font?

The Unicode symbols are as defined by the Unicode Standard, developed by the members of the Unicode Consortium.

Hi PGC,

All string values in Excel cells or VBA variables are Unicode.

This code shows that each of 2 symbols of A1 value "Hi"” is 2 bytes length (4 bytes length total) Unicode symbol, independently of the font used.
Rich (BB code):
Sub Test()

  Dim b() As Byte

  With Range("A1")

    .Value = "Hi"
    
    ' Arial
    .Font.Name = "Arial"
    Debug.Print "Arial", "Chars = " & Len(.Value), "Bytes = " & LenB(.Value)
    
    ' Wingdings
    .Font.Name = "Wingdings"
    Debug.Print "Wingdings", "Chars = " & Len(.Value), "Bytes = " & LenB(.Value)
    
    ' Unicode bytes
    b = .Value
    For i = LBound(b) To UBound(b)
      Debug.Print i, "Int=" & b(i), "Hex=" & Hex(b(i)), "Chr=" & Chr(b(i))
    Next
    
  End With

End Sub

Regards,
Vladimir
 
Upvote 0
All string values in Excel cells or VBA variables are Unicode.

Hi Valdimir

I think that what you mean is that strings in Excel and vba have 16bit characters.

A Unicode font must display the Unicode characters. The Unicode standard is very specific in the character definition.

Here's an example taken from the Unicode standard.

The character U+0041 encodes the character: LATIN CAPITAL LETTER G

Some properties of this character:

G is an alphabetic character.
G is in the Latin script. G is an uppercase letter.
G is not used in hexadecimal expressions.
G collates after F in the English alphabet.
G was putatively invented by Spurius Carvilus Ruga ca. 300.
G commonly represents the velar voiced stop in orthographies.
G is not a punctuation character.
G denotes giga in the SI system of nomenclature.
G has no diacritic.
G is a base character.
G is not a combining character.

As you see, the standard tries to make clear what character is encoded at the code point U+0041. You do have some freedom when you design the font, and that's why Arial and Times New Roman are different, but to be a Unicode font they both must respect the characters definitions.

Wingdings surely does not respect the Unicode characters definitions.

I had to design a font myself once. I used a font editor and draw the characters I needed according to the specifications. The fact that there were many characters imposed ecoding with 16bits. This did not make my font Unicode, it was simply a font with characters encoded with 16bits. In fact, this was before Unicode even existed.

You can check the Unicode Consortium here:

http://unicode.org/
 
Upvote 0
Hi Valdimir
I think that what you mean is that strings in Excel and vba have 16bit characters.
A Unicode font must display the Unicode characters. The Unicode standard is very specific in the character definition.
...

Thank you Pedro,

Seems that Arial font does not meet all requirements of Unicode standard, but Arial Unicode MS font does ;)

Ok, I’m trying to be more specific.

Excel cell has the properties among which the Value and Font are.
If the value is string character then we can talk about character code.
Let’s assume that we put in A1 cell Unicode symbol WHITE SMILING FACE, its code is &H263A:

Code:
Range("A1").Font.Name = "Arial" 
Range("A1").Value = ChrW(&H263A)

Now Value of A1 cell is the string containing Unicode character in terms of VBA, regardless of the Font property by the way.
It’s because VBA help says: "The ChrW function returns a String containing the Unicode character".

Having look on VBA help of AscW() function we can find that:
The AscW function returns the Unicode character code

Therefore, we can read Unicode character code of A1 cell value in which the string containing Unicode character was previously placed.

Code:
Range("A1").Font.Name = "Arial"
Debug.Print "Arial", "AscW=" & AscW(Range("A1").Value)

Range("A1").Font.Name = "Wingdings"
Debug.Print "Wingdings", "AscW=" & AscW(Range("A1").Value)

As expected AscW=9786 Unicode character code is printed regardless of the Font of A1 cell.

My conclusion: there are relatively strong requirements to the shape of each Unicode symbols in Unicode standard.

But in VBA the Value and Font are the independent properties of the cell, and instead of Unicode symbol we can talk also about Unicode character and Unicode character code without taking in account the Font property.

So, your truth - my first remark was not correct: there are different visual shapes of the same Unicode symbol for the different fonts.
It should be replaced by: there are different visual shapes of the same Unicode character for the different fonts in Excel.

Regards,
Vladimir
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,824
Messages
6,181,187
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top