How to remove non-alphanumeric characters from a text string using only an array formula

txjeff

New Member
Joined
Mar 10, 2020
Messages
2
Office Version
  1. 2019
Platform
  1. Windows
I spent considerable time unsuccessfully looking for solution (formula-based) to remove non-alphanumeric characters. Here is a solution that will hopefully help others with a similar problem.

  • Removes characters not in the embedded valid character string ("0123456789abcdefghijklmnopqrstuvwxyz" in this example formula).
  • Handles any string regardless of length
  • Uses ROW(INDIRECT("1:"&LEN(B2))) to generate an index to wa
  • Uses the functions MID, LEN, IFERROR, FIND, ROW, INDIRECT, and LEN
  • Replaces characters not in the valid character string ""
Assuming B2 has the string to be filtered, use the following ARRAY FORMULA (you must enter by holding CTRL-SHIFT-ENTER):
Rich (BB code):
=TEXTJOIN("",1,MID(B2,ROW(INDIRECT("1:"&LEN(B2)))*IFERROR(FIND(MID(B2,ROW(INDIRECT("1:"&LEN(B2))),1),"0123456789abcdefghijklmnopqrstuvwxyz")>0,LEN(B2)+1),1))

Explanation:
Assuming B2 contains "Hello! This is the test string", the formula will yield "ello This is the test string"

You may add upper case to the validation string to preserve upper case or simple use LOWER of B2 if you prefer.
This utilizes Excel Array features. The sub-formula,
Rich (BB code):
ROW(INDIRECT("1:"&LEN(B2)))
creates the array {1, 2, 3...} of indices for each character in B2. It simply gives an index value for
Rich (BB code):
MID
so we can evaluate each character in the string against our valid characters via
Rich (BB code):
FIND
. As FIND returns the index (a positive value means it found the character in the validation string), that is passed back into a results array for each character in the B2 string. Values greater than zero (see >0 following the FIND function). This returns an array of logical TRUE (1) values when a character is found. For characters not found, the IFERROR returns a multiplicative value guaranteed to be larger than the string in B2. The result in the MID function indexes beyond the end of the string in B2. In this case, it always returns NULL or "".

Enjoy
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
There was a typo in my original post. "Hello! This is the test string", the formula will yield "ello his is the test string"
 
Upvote 0
This is a cool formula, especially if you must use formulas. But if you have the ability to use Power Query or VBA I would go with one of those options instead. Especially if you have a lot of data and performance is an issue.

I ran the VBA below on ~100k rows and it took 0.84 seconds to run. Power Query took about a second.

I don't have TextJoin on this computer so I can't speed test the formula option, but I would be surprised if the performance of a volatile array formula like that could keep up.


Power Query:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    noAlpha = Table.TransformColumns(Source,{"Raw",each Text.Remove(_,{"0".."9","a".."z","A".."Z"})})
in
    noAlpha


VBA: Data in A1:A100,000
VBA Code:
Sub Main()
Application.ScreenUpdating = False

With Range("A1:A" & Range("A" & Rows.Count).End(xlUp).Row).Offset(, 1)
    .FormulaR1C1 = "=""'""&noRegex(RC[-1])"
    .Value = .Value
End With

Application.ScreenUpdating = True
End Sub

Function noRegex(s As String)
Dim tmp As String
    For i = 1 To Len(s)
        tmp = Mid(s, i, 1)
        If Not (tmp Like "[A-Z]" Or tmp Like "[a-z]" Or tmp Like "[0-9]") Then noRegex = noRegex & tmp
    Next i
    If noRegex = 0 Then noRegex = vbNullString
End Function
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,178
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top