Extracting Japanese Characters from a string in a cell

Shinano · Jan 30, 2008

I have a sheet with several thousands cells which each contain a path to a file on a network drive. Some of the folder names and file names in these strings are written in Japanese characters.

I need to mark cells which which contains strings (network path & file name) in which there are Japanese characters. CHAR and CODE appears not to be of much use in this connection.

Anybody out there that has an idea about how I might be able to do something like this?

Thank you very much in advance.

Richard Schollar · Jan 30, 2008

Hi

try the following code (assumes that other than japanese characters you have the default (ie western) character set):

Code:

Sub IdentifyCharacters()
Dim r As Range, b() As Byte, i As Long, cell As Range, strText As String
Set r = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
For Each cell In r
    If Not IsEmpty(cell.Value) Then
        strText = cell.Value
        b = strText
        For i = LBound(b) + 1 To UBound(b) Step 2
            If b(i) > 0 Then
                cell.Interior.ColorIndex = 3
                Exit For
            End If
        Next
    End If
Next cell
End Sub

Currently operates on the A column - you may have to amend.

Shinano · Jan 31, 2008

Richard,

Yet a cool solution from you. Thanks a lot.

I am trying to understand your code, but again, I am still climbing the learning curve.

Which part is actually identifying the Japanese characters?

I should appreciate if you could elaborate a little on your solution.

Again, thanks a lot.

jindon · Jan 31, 2008

How about

Code:

Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "\w+"
    .Global = True
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        r.Value = .replace(r.Value, "")
    Next
End With
End Sub

jindon · Jan 31, 2008

Highlight...

Code:

Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "[^\w]"
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        If .test(r.Value) Then r.Interior.Color = vbRed
    Next
End With
End Sub

Richard Schollar · Jan 31, 2008

Shinano said:
Richard,

Yet a cool solution from you. Thanks a lot.

I am trying to understand your code, but again, I am still climbing the learning curve.

Which part is actually identifying the Japanese characters?

I should appreciate if you could elaborate a little on your solution.

Again, thanks a lot.

Each unicode character takes up 2 bytes of data - the first byte identifies the character within the character set (assume standard Western character set, and CHAR(65) for example returns a capital A). The second byte identifies the character set: 0 is the Western set (also known as ANSI or extended ASCII set).

Hence, to identify non-Western characters, we just need to identify characters which are from a non-zero character set. Strings in VBA are stored in unicode, so when we assign a string value from a range to a string, we know that we end up with a unicode string:

strText = cell.Value

If we then assign the string to a byte array, we end up with the byte values of each string character within the byte array (2 bytes for each character, as explained above):

b = strText

Then we just need to loop thru the byte array and see if any of the character sets are non zero - since we know it is the second byte character that contains character se information, we only need to examine every second character which is what the For... Next does:

For i = (LBound(b) + 1) To UBound(b) Step 2

which loops from the 2 byte to the end incrementing by 2 each time (so we only look at the character set identifier).

If we find a non-zero one, then the cell contains something other than standard ANSI characters, so we highlight the cell in red. Then proceed to the next cell.

Make sense?

Jindon's code uses a regular expression and is somwhat wider ranging than mine (it will also highlight characters such as "@" or "~" or "#" in the cell values) - you may find it more suited to your needs, or not, if you don't have just alphanumerics in your file/path names.

Shinano · Jan 31, 2008

Hi Jindon,

Thank you very much for the proposal. I appreciate it.

jindon said:

How about

Code:

Sub test()
Dim r As Range
With CreateObject("VBScript.RegExp")
    .Pattern = "\w+"
    .Global = True
    For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
        r.Value = .replace(r.Value, "")
    Next
End With
End Sub

I tried it out, and whereas I see it come in handy for other solutions, it does not exactly do the job I need to be done, as it takes special characters in addition to the Japanese characters, which is a little too much.

However, thank you very much.

By the way, I forgot to add that your highlighter was a little too powerfull. It basically highlighted the entire column. Thanks, though.

Richard Schollar · Feb 1, 2008

Yeah it will 'cos of the "\" in the path name (and possibly the ":" too if you have mapped drive letters)...

jindon · Feb 2, 2008

Easier way...
1) Conditional Formatting
Formula =Len(A1)<>LenB(A1)

2) vba

Code:

For Each r In ActiveSheet.UsedRange
    If Len(r.Value) <> LenB(StrConv(r.Value, vbFromUnicode)) Then
         r.Interior.Color = vbRed
    End If
Next

Djbower1 · Oct 27, 2014

Hi Richard,

I came across this forum thanks to google and hope you could tell me how to adapt your code below so that it removes Japanese characters instead of highlighting them.

Thanks

Dan

Richard Schollar said:

Code:

Sub IdentifyCharacters()
Dim r As Range, b() As Byte, i As Long, cell As Range, strText As String
Set r = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
For Each cell In r
    If Not IsEmpty(cell.Value) Then
        strText = cell.Value
        b = strText
        For i = LBound(b) + 1 To UBound(b) Step 2
            If b(i) > 0 Then
                cell.Interior.ColorIndex = 3
                Exit For
            End If
        Next
    End If
Next cell
End Sub

Extracting Japanese Characters from a string in a cell

Shinano

Board Regular

Richard Schollar

MrExcel MVP

Shinano

Board Regular

jindon

MrExcel MVP

jindon

MrExcel MVP

Richard Schollar

MrExcel MVP

Shinano

Board Regular

Richard Schollar

MrExcel MVP

jindon

MrExcel MVP

Djbower1

New Member

Similar threads

Forum statistics

Share this page

Extracting Japanese Characters from a string in a cell

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

MrExcel MVP

MrExcel MVP

Board Regular

MrExcel MVP

MrExcel MVP

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock