how to find and identify unicode chars

Harvey

Well-known Member
Joined
Nov 18, 2004
Messages
953
Hello all,

I have a number of xls files, which I have to 'clean up' following some guidelines. This are the ones I am having trouble with. Can anybody point me in the right direction?

- There are Unicode characters (UTF-8) in the document. Not all of these characters display. Remove all characters that do not display the corresponding symbol (remove all 'squares').

There is lots of Unicode, but when I use ASC I get their ASCII value. How can I 'see' when a character is Unicode? Furthermore, how can I 'see' if it's unrecognized?

- Replace ASCII character '16' with Unicode character '63'.

How to do this?

Thanks for the help already,

Kind regards,

Martien de Jong
 
Last edited:

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
I need at least one reply to make sure that someone has actually read the topic.
Please help me someone
 
Upvote 0
I need at least one reply to make sure that someone has actually read the topic
You can see how many people have read it by looking at the "Views" Column on the Forum.

- Replace ASCII character '16' with Unicode character '63'.

=SUBSTITUTE(A1,CHAR(16),CHAR(63))

How can I 'see' when a character is Unicode?

In any blank column, Row 1 put..
=CHAR(ROW())
Fill down.
 
Upvote 0
Hi there,

Thanks for the reply, but...
none of your stuff is working.
I found the ChrW function for generating unicode.
Now, only this question stays open:

I need to find all unrecognized unicode characters in my sheet, and delete them.
 
Upvote 0
Hi Harvey

How many of these characters are there likely to be in this document? Are they restricted to a single worksheet, or spread across several? How many cells on each spreadsheet? I ask because I can think of a way of doing it but it will iterate thru every character in every cell containing data to identify any of these characters and remove them. This will potentially take an exceptionally long time. Alternatively, you may be able to use a regular expression, but I am unsure as to whether a regex will remove these characters (you could send me a small sample of your data to test if you would like - I will give you my email if you would like me to try this).
 
Upvote 0
Hello RichardSchollar,

Thank you for your offer.
I have just solved the problem by assuming there are only three unrecognized characters (the only ones I could find, that weren't recognized).
As for clarification, I will answer your questions.
I use ChrW in a Range.Replace now and it does the job.
The 'documents to edit' are about 200 xls files, each containing one sheet, with 10.000 rows, 20-50 columns. They are literally filled with gibberish, of which most unicode chars.
The range.replace now finished in about 3 seconds per document, which is still acceptable.
A colleague of mine said he would do it in Java, which would offer a much better performance, but he is on vacation now :'(
Anyways, thank you for the input guys. If I find the Java solution I will post it for anyone interested.
Cheers and see you next time :)

P.S. How would you check if a character is 'recognized', I couldnt come up with anything for that.
 
Last edited:
Upvote 0
I wouldn't check if it was recognized - I would test the character set of the character to see if it was 0 (standard ANSI) or anything else (ie something you don't want):

Code:
'assume A1 contains a string and you want to check it contains no funny characters
Dim s as String, b() as Byte, i As Long
s = Range("A1").Value
b=s
For i = 1 to UBound(b) Step 2
  If b(i)>0 Then
    'this character is not from the standard character set! Need to do something with it!
  End If
Next

Much slower than a straightforward Replace operation which you have gone for, however ;)
 
Last edited:
Upvote 0
That is just what I was looking for,
I am definitely going to use that later on!
Thanks for posting the code.
(y)
 
Upvote 0
By the way, it actually runs much faster through the sheet than I expected.
Thanks again.
 
Upvote 0

Forum statistics

Threads
1,218,270
Messages
6,141,471
Members
450,361
Latest member
Barnaby2024

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top