Extracting Japanese Characters from a string in a cell

Shinano

Board Regular
Joined
Dec 5, 2004
Messages
65
Office Version
  1. 365
Platform
  1. Windows
I have a sheet with several thousands cells which each contain a path to a file on a network drive. Some of the folder names and file names in these strings are written in Japanese characters.

I need to mark cells which which contains strings (network path & file name) in which there are Japanese characters. CHAR and CODE appears not to be of much use in this connection.

Anybody out there that has an idea about how I might be able to do something like this?

Thank you very much in advance.
 
Hi

try the following code (assumes that other than japanese characters you have the default (ie western) character set):

Code:
Sub IdentifyCharacters()
Dim r As Range, b() As Byte, i As Long, cell As Range, strText As String
Set r = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
For Each cell In r
    If Not IsEmpty(cell.Value) Then
        strText = cell.Value
        b = strText
        For i = LBound(b) + 1 To UBound(b) Step 2
            If b(i) > 0 Then
                cell.Interior.ColorIndex = 3
                Exit For
            End If
        Next
    End If
Next cell
End Sub

Currently operates on the A column - you may have to amend.
 
Upvote 0
Richard,

Yet a cool solution from you. Thanks a lot.

I am trying to understand your code, but again, I am still climbing the learning curve.

Which part is actually identifying the Japanese characters?

I should appreciate if you could elaborate a little on your solution.

Again, thanks a lot.
 
Upvote 0
How about
Code:
Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "\w+"
    .Global = True
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        r.Value = .replace(r.Value, "")
    Next
End With
End Sub
 
Upvote 0
Highlight...
Code:
Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "[^\w]"
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        If .test(r.Value) Then r.Interior.Color = vbRed
    Next
End With
End Sub
 
Upvote 0
Richard,

Yet a cool solution from you. Thanks a lot.

I am trying to understand your code, but again, I am still climbing the learning curve.

Which part is actually identifying the Japanese characters?

I should appreciate if you could elaborate a little on your solution.

Again, thanks a lot.

Each unicode character takes up 2 bytes of data - the first byte identifies the character within the character set (assume standard Western character set, and CHAR(65) for example returns a capital A). The second byte identifies the character set: 0 is the Western set (also known as ANSI or extended ASCII set).

Hence, to identify non-Western characters, we just need to identify characters which are from a non-zero character set. Strings in VBA are stored in unicode, so when we assign a string value from a range to a string, we know that we end up with a unicode string:

strText = cell.Value

If we then assign the string to a byte array, we end up with the byte values of each string character within the byte array (2 bytes for each character, as explained above):

b = strText

Then we just need to loop thru the byte array and see if any of the character sets are non zero - since we know it is the second byte character that contains character se information, we only need to examine every second character which is what the For... Next does:

For i = (LBound(b) + 1) To UBound(b) Step 2

which loops from the 2 byte to the end incrementing by 2 each time (so we only look at the character set identifier).

If we find a non-zero one, then the cell contains something other than standard ANSI characters, so we highlight the cell in red. Then proceed to the next cell.

Make sense?

Jindon's code uses a regular expression and is somwhat wider ranging than mine (it will also highlight characters such as "@" or "~" or "#" in the cell values) - you may find it more suited to your needs, or not, if you don't have just alphanumerics in your file/path names.
 
Upvote 0
Hi Jindon,

Thank you very much for the proposal. I appreciate it.

How about
Code:
Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "\w+"
    .Global = True
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        r.Value = .replace(r.Value, "")
    Next
End With
End Sub

I tried it out, and whereas I see it come in handy for other solutions, it does not exactly do the job I need to be done, as it takes special characters in addition to the Japanese characters, which is a little too much.

However, thank you very much.

By the way, I forgot to add that your highlighter was a little too powerfull. It basically highlighted the entire column. Thanks, though.
 
Last edited:
Upvote 0
Easier way...
1) Conditional Formatting
Formula =Len(A1)<>LenB(A1)

2) vba
Code:
For Each r In ActiveSheet.UsedRange
    If Len(r.Value) <> LenB(StrConv(r.Value, vbFromUnicode)) Then
         r.Interior.Color = vbRed
    End If
Next
 
Upvote 0
Hi Richard,

I came across this forum thanks to google and hope you could tell me how to adapt your code below so that it removes Japanese characters instead of highlighting them.

Thanks

Dan

Code:
Sub IdentifyCharacters()
Dim r As Range, b() As Byte, i As Long, cell As Range, strText As String
Set r = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
For Each cell In r
    If Not IsEmpty(cell.Value) Then
        strText = cell.Value
        b = strText
        For i = LBound(b) + 1 To UBound(b) Step 2
            If b(i) > 0 Then
                cell.Interior.ColorIndex = 3
                Exit For
            End If
        Next
    End If
Next cell
End Sub
 
Upvote 0

Forum statistics

Threads
1,226,835
Messages
6,193,231
Members
453,781
Latest member
Buzby

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top