Check text string for invalid characters

AllyGraham

New Member
Joined
Oct 6, 2014
Messages
18
I have been working to check text strings in cells for invalid characters to use in a CSV data file, and I have got so far, but I want to see if I can make it any better or simpler.

If I have text in cell A1 is "abc123#", I wish to both state in a cell in the same row (say C1) the check result of "Invalid characters [#]", such that A1 is checked against a named range that contains a list of so called valid characters, and lists what the specific characters are that are invalid.
I found a formula that worked to state that there are invalid characters:
Excel Formula:
=IF(LEN(A1)=0,"",REPT("Invalid characters"&SUBSTITUTE((" ["&MassReplace(A1,ValidList,ValidList2)&"]")," []",""),
SUMPRODUCT(0+(ISNUMBER(MATCH(MID(A1,ROW(INDEX(A:A,1):INDEX(D:D,LEN(A1))),1),ValidList,0))))<>LEN(A1))&"")
Which works well for what I need, and through taking a UDF I found, I can get the substitutions to work without being nested:
VBA Code:
Function MassReplace(InputRng As Range, FindRng As Range, ReplaceRng As Range) As Variant()
  Dim arRes() As Variant 'array to store the results
  Dim arSearchReplace(), sTmp As String 'array where to store the find/replace pairs, temporary string
  Dim iFindCurRow, cntFindRows As Long 'index of the current row of the SearchReplace array, count of rows
  Dim iInputCurRow, iInputCurCol, cntInputRows, cntInputCols As Long 'index of the current row in the source range, index of the current column in the source range, count of rows, count of columns
 
  cntInputRows = InputRng.Rows.Count
  cntInputCols = InputRng.Columns.Count
  cntFindRows = FindRng.Rows.Count
 
  ReDim arRes(1 To cntInputRows, 1 To cntInputCols)
  ReDim arSearchReplace(1 To cntFindRows, 1 To 2) 'preparing the array of find/replace pairs
 
  For iFindCurRow = 1 To cntFindRows
    arSearchReplace(iFindCurRow, 1) = FindRng.Cells(iFindCurRow, 1).Value
    arSearchReplace(iFindCurRow, 2) = ReplaceRng.Cells(iFindCurRow, 1).Value
  Next
 
  'Searching and replacing in the source range
  For iInputCurRow = 1 To cntInputRows
    For iInputCurCol = 1 To cntInputCols
      sTmp = InputRng.Cells(iInputCurRow, iInputCurCol).Value
      'Replacing all find/replace pairs in each cell
      For iFindCurRow = 1 To cntFindRows
        sTmp = Replace(sTmp, arSearchReplace(iFindCurRow, 1), arSearchReplace(iFindCurRow, 2))
      Next
      arRes(iInputCurRow, iInputCurCol) = sTmp
    Next
  Next
 
  MassReplace = arRes
End Function
(source for UDF: Excel: find and replace multiple values at once - Ablebits.com)

But this looks to be volatile, and I do not really need the second column as each substitution of a character becomes "".

I feel that I can make this simpler, and I also feel I had a method in my previous job that used a non-nested substitution formula without UDF to substitute a large array down to "" again to highlight or count for invalid characters, but I do not have access to this workbook any longer.
Can anyone suggest a simple non-volatile non-nested scalable method to flag that the cell contains invalid characters and list each invalid character?
 

Excel Facts

How to show all formulas in Excel?
Press Ctrl+` to show all formulas. Press it again to toggle back to numbers. The grave accent is often under the tilde on US keyboards.
Since you are already using VBA it makes more sense to do the whole solution in a UDF instead of just one piece of it.

I am going to suggest a VBA solution that uses regexp but I can't figure out what is the set of invalid characters or what your MassReplace function does. I am not going to try to reverse engineer your existing VBA code. What ranges to ValidList and ValidList2 refer to and what do they mean?
 
Upvote 0
Since you are already using VBA it makes more sense to do the whole solution in a UDF instead of just one piece of it.

I am going to suggest a VBA solution that uses regexp but I can't figure out what is the set of invalid characters or what your MassReplace function does. I am not going to try to reverse engineer your existing VBA code. What ranges to ValidList and ValidList2 refer to and what do they mean?
Thank you.

ValidList refers to data in A2:A43 and ValidData2 was an afterthought to give the blank substitutions, and is B2:B43.
Book1
AB
1Valid email characters
20
31
42
53
64
75
86
97
108
119
12a
13b
14c
15d
16e
17f
18g
19h
20i
21j
22k
23l
24m
25n
26o
27p
28q
29r
30s
31t
32u
33v
34w
35x
36y
37z
38+
39-
40_
41~
42@
43.
Sheet1


But I hope that if I can just crack a simple substitute text with an array to blank without nesting, that is the main bit that has got me stuck.
 
Upvote 0
I had hoped there was a simple way to use SUBSITUTE in an array method to replace any character that is in the named list with a blank (""), as if after this is complete, I can do everything I want to display text that says there are invalid characters and what ones they are (first part is if it is not blank after substitution, and second part is the result after substitution), without nesting.
 
Upvote 0
No ranges containing the valid characters are needed.

This replaces all your VBA code and your long formula.

Here is the VBA code.
VBA Code:
Public Function CheckChars(S As String) As String

   Dim RE As Object
   Dim MatchList As Object
   Dim Match As Variant
   Dim Msg As String
  
   Set RE = CreateObject("vbscript.regexp")
  
   RE.Pattern = "[^\da-z+-_~@.]"
   RE.IgnoreCase = False
   RE.Global = True
  
   Set MatchList = RE.Execute(S)
 
   For Each Match In MatchList
      Msg = Msg & Match
   Next Match
  
   If Msg = "" Then
      CheckChars = "No invalid characters"
   Else
      CheckChars = "Invalid characters: " & Msg
   End If
   
End Function


Here is how to use it:
$scratch.xlsm
AB
1asdf!@#$%Invalid characters: !#$%
2asdfzxcv1234No invalid characters
3abc123#Invalid characters: #
Sheet3
Cell Formulas
RangeFormula
B1:B3B1=checkchars(A1)
 
Upvote 0
Solution
Thank you Jeff, works just how I was wanting. I have been able to further adapt it for other data fields also. Many thanks.
 
Upvote 0
... works just how I was wanting.
⚡ Are you sure? Test it with the cell value: a[?\]:;<=>^b

I think the pattern really should be
VBA Code:
RE.Pattern = "[^\da-z+\-_~@\.]"

However, you might also consider this more compact function

VBA Code:
Function Invalid(s As String) As String
  Dim RX As Object
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "[\da-zA-Z+\-_~@\.]"
  Invalid = "Invalid characters: " & RX.Replace(s, "")
End Function

AllyGraham.xlsm
AB
1asdf!@#$%+Invalid characters: !#$%
2asdfzxcv1234Invalid characters:
3abc123#Invalid characters: #
4a[?\]:;<=>^bInvalid characters: [?\]:;<=>^
Sheet1
Cell Formulas
RangeFormula
B1:B4B1=Invalid(A1)
 
Upvote 0
I think the pattern really should be
VBA Code:
RE.Pattern = "[^\da-z+\-_~@\.]"
You are correct that the hyphen should be escaped, although if the hyphen is moved to be the last character within the brackets then it will not have to be escaped.

You do not have to escape a "." in brackets. It will be interpreted as a dot character, not as a "match any character" symbol.
 
Upvote 0
You do not have to escape a "." in brackets. It will be interpreted as a dot character, not as a "match any character" symbol.
:oops: Yes, I was checking escaping and not escaping a few characters and forgot to remove that one. No harm with it there though.
 
Upvote 0

Forum statistics

Threads
1,225,738
Messages
6,186,728
Members
453,368
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top