CLEAN() not dependable to remove non-printable characters?

Status
Not open for further replies.

yky

Well-known Member
Joined
Jun 7, 2011
Messages
1,881
Office Version
  1. 2010
Platform
  1. Windows
I copied this date "05/10/2020" from this thread. When trying to manipulate it, I found it behaving weirdly as its length is 11 and RIGHT(A1,4) returns "020" when it should return four characters. Furthermore, LEFT(A1,10), though looks on screen exactly like A1, is not the same as A1 as =LEFT(A1,10)=A1 returns FALSE. That leads me to suspect the string contains a non-printable character at the end. Yet, CLEAN() doesn't remove the non-printable character as LEN(CLEAN(TRIM(A1))) still returns 11. The function CLEAN() is supposed to remove non-printable characters and I have used it to do so. Now, it doesn't seem to be dependable. It seems from now on I'll need to double-check the result whenever I use CLEAN().

test1.xlsx
ABCDEFGHIJKLMN
105/10/2020020#VALUE!"0""5""/""1""0""/""2""0""2""0"""
2202011
3/202011
Sheet2
Cell Formulas
RangeFormula
B1B1=RIGHT(A1,4)
C1C1=RIGHT(A1)*1
D1:N1D1="""" & MID($A$1,COLUMN()-3,1) & """"
B2B2=RIGHT(A1,5)
C2C2=LEN(A1)
B3B3=RIGHT(A1,6)
C3C3=LEN(CLEAN(TRIM(A1)))
 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
Here is a UDF (user defined function) that I developed which will clean and trim the text passed into it. The trim operation is identical to Excel's worksheet TRIM function; however, the clean is slightly different. It cleans some additional non-printing characters that Excel's CLEAN function does not handle. Those additional characters are delineated here...

Remove spaces and nonprinting characters from text - Support - Office.com

I also included an optional argument to convert non-breaking spaces (ASCII 160) to real spaces (ASCII 32). Because non-breaking spaces are such a problem when copying text from the web, I defaulted this optional argument to True (meaning non-breaking space will be converted into true spaces and then handled, along with existing spaces, by the trim operation).

Code:
Function CleanTrim(ByVal S As String, Optional ConvertNonBreakingSpace As Boolean = True) As String
  Dim X As Long, CodesToClean As Variant
  CodesToClean = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, _
                       21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 129, 141, 143, 144, 157)
  If ConvertNonBreakingSpace Then S = Replace(S, Chr(160), " ")
  For X = LBound(CodesToClean) To UBound(CodesToClean)
    If InStr(S, Chr(CodesToClean(X))) Then S = Replace(S, Chr(CodesToClean(X)), "")
  Next
  CleanTrim = WorksheetFunction.Trim(S)
End Function

For those reading this thread who are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use CleanTrim just like it was a built-in Excel function. For example,

=CleanTrim(A1)
 
Upvote 1
Hmm, I also copied from the original thread and = CODE(RIGHT(DATE,1)) reveals that the last character is ascii 63, which is a question mark and a printable character.
One has great eyes if you spot that in the text as shown below
Book1
ABCDEFGH
1roduct codeLotExp. DateStock (kg)Last CharCodeLast Char
21004-140000633176805/10/202015,62663?
31004-140000633205520/11/20208163?
41004-140000633203510/12/202027063?
51005-140000313159015/09/20207,55463?
Sheet1
Cell Formulas
RangeFormula
F2:F5F2=CODE(RIGHT(C2,1))
G2:G5G2=CHAR(F2)


EDIT:
- Might this "hidden" character come from the copy from the website?
- And if I read the UDF correctly, this is not going to solve this particular issue?
 
Upvote 0
It's a unicode character 8203, which (I think) gets inserted when people paste their data to the board without using the XL2BB add-in.
I tend to use
VBA Code:
ActiveSheet.UsedRange.Replace What:=ChrW(8203), Replacement:="", LookAt:=xlPart
to get rid of them.
 
  • Like
Reactions: yky
Upvote 0
I just concluded the same via Power Query
1592499316796.png


EDIT: and even the clean/trim combo in PQ does not handle it well.
1592499738543.png
 
Last edited:
  • Like
Reactions: yky
Upvote 0
Here is a UDF (user defined function) that I developed which will clean and trim the text passed into it. The trim operation is identical to Excel's worksheet TRIM function; however, the clean is slightly different. It cleans some additional non-printing characters that Excel's CLEAN function does not handle. Those additional characters are delineated here...

Remove spaces and nonprinting characters from text - Support - Office.com

I also included an optional argument to convert non-breaking spaces (ASCII 160) to real spaces (ASCII 32). Because non-breaking spaces are such a problem when copying text from the web, I defaulted this optional argument to True (meaning non-breaking space will be converted into true spaces and then handled, along with existing spaces, by the trim operation).

Code:
Function CleanTrim(ByVal S As String, Optional ConvertNonBreakingSpace As Boolean = True) As String
  Dim X As Long, CodesToClean As Variant
  CodesToClean = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, _
                       21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 129, 141, 143, 144, 157)
  If ConvertNonBreakingSpace Then S = Replace(S, Chr(160), " ")
  For X = LBound(CodesToClean) To UBound(CodesToClean)
    If InStr(S, Chr(CodesToClean(X))) Then S = Replace(S, Chr(CodesToClean(X)), "")
  Next
  CleanTrim = WorksheetFunction.Trim(S)
End Function

For those reading this thread who are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use CleanTrim just like it was a built-in Excel function. For example,

=CleanTrim(A1)
Hi, Rick, thanks for the function. However, LEN(CLEANTRIM(A1)) still gives 11, not 10.
 
Upvote 0
It's a unicode character 8203, which (I think) gets inserted when people paste their data to the board without using the XL2BB add-in.
I tend to use
VBA Code:
ActiveSheet.UsedRange.Replace What:=ChrW(8203), Replacement:="", LookAt:=xlPart
to get rid of them.
Hello, Fluff, thanks for the code. However, I got a very strange result. After running the code, LEN() now reports a length of 5. 5??? How is that possible?

Book1
ABC
15/10/20205
Sheet1
Cell Formulas
RangeFormula
C1C1=LEN(A1)
 
Upvote 0
LEN() now reports a length of 5. 5??? How is that possible?

5 is correct. Remember a date is only a number formatted to look like a date.
Len is looking at the number which in my case as I am in dd/mm/yyyy format is 44109.

As for the issue with Clean, if you look at the help file then there is the below excerpt (obviously as you have found out the list on non-printing characters is not complete).
Important:
The CLEAN function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text. In the Unicode character set, there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters. For an example of how to remove these additional nonprinting characters from text, see Top ten ways to clean your data.
 
Last edited:
  • Like
Reactions: yky
Upvote 0
Solution
5 is correct. Remember a date is only a number formatted to look like a date.

Len is looking at the number which in my case as I am in dd/mm/yyyy format is 44109.

As for the issue with Clean, if you look at the help file then there is the below excerpt (obviously as you have found out the list on non-printing characters is not complete).
Thank you. That explains why CLEAN() doesn't remove the non-printable character.

The date was originally in TEXT format. How come Fluff's code converted it to numeric date? By the way, the code seems to be a great way to convert TEXT date to serial date.

In this thread, I asked why LEN(1,000,000) returns 8, not 7 and DanteAmor kindly pointed out that the LEN() function returns the length of the variable type, not the length of the variable itself. Why, then, in this case, LEN() returns the length of the variable itself?
 
Upvote 0
How come Fluff's code converted it to numeric date?
Not sure about the workings in the background but it is quite common for a Replace to trigger Excel to do another assessment of what it believes is the best fit for the data (pretty similar to what Text to Columns does).

. Why, then, in this case, LEN() returns the length of the variable itself?
Because in the thread DanteAmor commented on you were using the VBA version of LEN, here you are using the formula version.
They behave differently (a bit like the VBA version of TRIM behaves differently to the formula version).
 
  • Like
Reactions: yky
Upvote 0
Status
Not open for further replies.

Forum statistics

Threads
1,225,738
Messages
6,186,736
Members
453,369
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top