I Have a function that usually use to remove special characters from Cells. This function allways worked fine, until I had a problem with the (ª) character - with Unicode [u00AA].
My function uses Regex pattern to exclude "Basic Latin" (\u0000-\u007F) and "Latin-1 Supplement" (\u00C0-\u00FF). This supplment includes the (ª - u00AA) char; but it's not working.
My Regex Pattern :
My Fuction :
As you can see below, Works with "[^\u0000-\u007F\u00C0-\u00FF\ª]" and not works with "[^\u0000-\u007F\u00C0-\u00FF\u00AA]". I'm Portuguese and we use both the chars (ª and º). Our Keyboards have the Key shown on the upload image. Can anyone help and tell why those chars included in Unicode List "Latin-1 Supplement", does not work, and wich is the best way to alter my regex.
See the excel outputs:
Can anyone help please?
My function uses Regex pattern to exclude "Basic Latin" (\u0000-\u007F) and "Latin-1 Supplement" (\u00C0-\u00FF). This supplment includes the (ª - u00AA) char; but it's not working.
My Regex Pattern :
VBA Code:
"[^\u0000-\u007F\u00C0-\u00FF]"
My Fuction :
VBA Code:
Function StripNonAsciiChars(ByVal InputString As String, Optional Exclude_Latin1_Supplement As Boolean = True) As String
'Orgin::
'https://stackoverflow.com/questions/61336753/regex-to-exclude-non-ascii-but-keep-nordic-characters
'http://www.unicode.org/charts/ -> Unicode 15.0 Character Code Charts
'Exclude_Latin1_Supplement - InputString = "Sem informação disponibilizada"
'\u0020-\u007F - means the characters run from index 32 till index 127 and \u00C0-\u00FF runs from 192 till 255.
'Unicode is a superset of ASCII. However, the ASCII characters are in what is called a "block" not a Unicode category
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = True
Select Case Exclude_Latin1_Supplement
Case True
'Includes exception for Latin Characters like çÇ;ãÃ, etc... "Latin-1 Supplement" [u00C0-\u00FF]
'Added char (ª) - the code u00AA was not recognized. ("[^\u0000-\u007F\u00C0-\u00FF\u00AA]")
.Pattern = "[^\u0000-\u007F\u00C0-\u00FF\ª]"
Case False
'Excludes only "Basic Latin (ASCII) - \u0000-\u007F
.Pattern = "[^\u0000-\u007F]"
End Select
'StripNonAsciiChars = Application.WorksheetFunction.Trim(RegEx.Replace(InputString, " "))
StripNonAsciiChars = .Replace(InputString, vbNullString)
End With
End Function
As you can see below, Works with "[^\u0000-\u007F\u00C0-\u00FF\ª]" and not works with "[^\u0000-\u007F\u00C0-\u00FF\u00AA]". I'm Portuguese and we use both the chars (ª and º). Our Keyboards have the Key shown on the upload image. Can anyone help and tell why those chars included in Unicode List "Latin-1 Supplement", does not work, and wich is the best way to alter my regex.
See the excel outputs:
New Microsoft Excel Worksheet.xlsx | |||||
---|---|---|---|---|---|
B | C | D | |||
1 | Input | Result | Pattern | ||
2 | SÉRGIO CARVALHO DOMINGUES & Cª, LIMITADA | SÉRGIO CARVALHO DOMINGUES & Cª, LIMITADA | "[^\u0000-\u007F\u00C0-\u00FF\ª]" | ||
3 | SÉRGIO CARVALHO DOMINGUES & Cª, LIMITADA | SÉRGIO CARVALHO DOMINGUES & C, LIMITADA | "[^\u0000-\u007F\u00C0-\u00FF]" | ||
Sheet2 |
Can anyone help please?