formula for excel to extract words

nn992

New Member
Joined
Jul 28, 2016
Messages
47
Hello everyone,

I have quite interesting/complex problem in excel.

I have a one field in excel which is composed of numbers, letters and spaces in between. The problem is that there is no rule for repeating or duplicating sequence of numbers/letters.

What I need to do is to extract only 5 letter words. Also, if the same word is repeating, I would like to exclude it (eg. take only one occurence)

Any ideas?

to show you how it looks like:

current text in my field: N0453F360 MEROS4B MEROS UN853 NETUP/N0450F380 UN853 MOLUS UN871 DITON T163 ZUE T125 ROMIR UN851 TEDGO/N0451F370 UN851 LOHRE UN746

desired output: MEROS NETUP MOLUS DITON ROMIR LOHRE

Could this be solved with formula or macro?

Thanks in advance
 

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop
You're going to need VBA. I implemented as a UDF:

Code:
Public Function GetSequencesOf5(inputString As String) As String

Dim uniqueStrings As Object
Dim testStrings() As String
Dim thisString As Long

' Output is currently blank
GetSequencesOf5 = ""

' Use a dictionary to keep track of unique values
Set uniqueStrings = CreateObject("Scripting.Dictionary")

' Add a space in case the last word is 5 characters long
testStrings = Split(inputString & " ", " ")

' Look at all strings
For thisString = 0 To UBound(testStrings)
    ' Is this 5 letters?
    If HowManyLetters(testStrings(thisString)) = 5 Then
        ' Have we already found this one?
        If Not uniqueStrings.Exists(testStrings(thisString)) Then
            ' No - add it to the dictionary and to the output
            uniqueStrings.Add testStrings(thisString), testStrings(thisString)
            GetSequencesOf5 = GetSequencesOf5 & IIf(GetSequencesOf5 = "", "", " ") & testStrings(thisString)
        End If
    End If
Next thisString

End Function
Private Function HowManyLetters(inputString As String) As Long

Dim thisChar As Long

' Look at all characters in the string
For thisChar = 1 To Len(inputString)
    ' Check the ASCII code
    Select Case AscW(UCase$(Mid$(inputString, thisChar, 1)))
        Case 65 To 90 ' A to Z
            HowManyLetters = HowManyLetters + 1
    End Select
Next thisChar

End Function


Book1
AB
1N0453F360 MEROS4B MEROS UN853 NETUP/N0450F380 UN853 MOLUS UN871 DITON T163 ZUE T125 ROMIR UN851 TEDGO/N0451F370 UN851 LOHRE UN746MEROS MOLUS DITON ROMIR LOHRE
Sheet1
Cell Formulas
RangeFormula
B1=GetSequencesOf5(A1)


WBD
 
Upvote 0
WOW!!! super fast and super accurate! thanks a bunch for this!

Just one small comment:

when you see the example above, there is one more word - NETUP, and TEGDO as well.He did not included that one... after netup is "/"
is it possible to include these words as well?
 
Upvote 0
Wherever there is 5 consecutive letters not separated I want it to be extracted:
1abcde123 - extract abcde
?><>abcde/// - extract abcde
etc.
 
Upvote 0
That's a shame. It's a different problem to the original statement and requires re-coding.

Code:
Public Function GetSequencesOf5(inputString As String) As String

Dim uniqueStrings As Object
Dim currentString As String
Dim thisChar As Long

' Output is currently blank
GetSequencesOf5 = ""

' Use a dictionary to keep track of unique values
Set uniqueStrings = CreateObject("Scripting.Dictionary")

' Add a space in case the last word is 5 characters long
inputString = inputString & " "

' Current string is empty
currentString = ""

' Look at all characters
For thisChar = 1 To Len(inputString)
    Select Case AscW(UCase$(Mid$(inputString, thisChar, 1)))
        Case 65 To 90
            currentString = currentString & Mid$(inputString, thisChar, 1)
        Case Else
            If Len(currentString) = 5 Then
                ' Have we already found this one?
                If Not uniqueStrings.Exists(currentString) Then
                    ' No - add it to the dictionary and to the output
                    uniqueStrings.Add currentString, currentString
                    GetSequencesOf5 = GetSequencesOf5 & IIf(GetSequencesOf5 = "", "", " ") & currentString
                End If
            End If
            
            ' Reset current string
            currentString = ""
    End Select
Next thisChar

End Function


Book1
AB
1N0453F360 MEROS4B MEROS UN853 NETUP/N0450F380 UN853 MOLUS UN871 DITON T163 ZUE T125 ROMIR UN851 TEDGO/N0451F370 UN851 LOHRE UN746MEROS NETUP MOLUS DITON ROMIR TEDGO LOHRE
Sheet1
Cell Formulas
RangeFormula
B1=GetSequencesOf5(A1)


WBD
 
Upvote 0

Forum statistics

Threads
1,223,903
Messages
6,175,289
Members
452,631
Latest member
a_potato

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top