Stripping text in Excel

nevolex

New Member
Joined
Oct 10, 2019
Messages
6
Hi everyone,

Is there a chance somebody can please help with a formula which will basically remove all the text from this text block (example) pasted in excel:

In literary theory, a text is any object that can be "read", whether this object is a work of literature, TK8881122 a street sign, an arrangement of buildings on a city block, or styles of clothing.
It is a coherent set of signs that transmits some kind of informative message.[1]
This set of signs is considered in terms of the informative message's content, rather than in terms of its physical form or the medium in which it is represented.
Within the field of literary criticism, "text" also refers to the original information content of a NM12345 particular piece of writing; that is, the "text" of a work is that primal symbolic arrangement of letters as originally composed, apart from later alterations, deterioration, commentary, translations, paratext, etc.

What I am hopping to get is all the text to be remove leaving just:
TK8881122
NM12345

Thanks a lot


 
You Might like this option as well
Code:
Sub test()
    Dim a, m As Variant
    Dim lr, t, i As Long
    lr = Cells(Rows.Count, 1).End(xlUp).Row
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "[A-Za-z]+.?\d+."
        For t = 1 To lr
            If .test(Cells(t, 1)) Then
                Set m = .Execute(Cells(t, 1))
                ReDim a(1 To m.Count)
                For i = 0 To m.Count - 1
                    a(i + 1) = Trim(m(i))
                Next
                Cells(t, 3) = Join(a, Chr(10))
            End If
        Next
    End With
End Sub
 
Upvote 0

Excel Facts

Quick Sum
Select a range of cells. The total appears in bottom right of Excel screen. Right-click total to add Max, Min, Count, Average.
Thank you for THE great support everyone.


I will provide some possible examples like:
TK8881122,
NM12345,
Yn22445672,
DFd-47483732,
DBR#31221,
AD/132345,
TDVG-342432,
QA_122123,
CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123
 
Upvote 0
Hi
As I can see the last code of mine is working Ok with your possibles
unless you are not happy with the "," at the end of each??
 
Upvote 0
As I can see the last code of mine is working Ok with your possibles
unless you are not happy with the "," at the end of each??
It may not be possible with the OP's data but based on the written description in post 7, shouldn't

abc R5 G; H6 a X### return R5 G; H6 and X###

and it is unclear (to me at least) whether
abc123WE3 should return nothing or both abc123 and WE3 or something else but to me your code returning abc123W doesn't seem to fit the request.


I will provide some possible examples like:
TK8881122,
NM12345,
Yn22445672,
DFd-47483732,
DBR#31221,
AD/132345,
TDVG-342432,
QA_122123,
CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123
Although possible examples help, it is not definitive.
You haven't given any examples or explained if anything is NOT allowed to follow a group of letters.
Hence my questions in post 10, & Rick's in post 9.
Perhaps also some comments on my examples above as well?
 
Upvote 0
Hi Peter, the verbal rule is "any amount of upper or lower case letters followed by 1 or more digits"


Thank you

It may not be possible with the OP's data but based on the written description in post 7, shouldn't

abc R5 G; H6 a X### return R5 G; H6 and X###

and it is unclear (to me at least) whether
abc123WE3 should return nothing or both abc123 and WE3 or something else but to me your code returning abc123W doesn't seem to fit the request.



???!!!
 
Upvote 0
Hi Peter, the verbal rule is "any amount of upper or lower case letters followed by 1 or more digits"


Thank you
???!!!
If that was the rule, how would we get example as posted in post 12?

You quoted the "verbal rule" from post 4. In post 7 it was modified to
.. please modify the code for "any amount of upper or lower case letters followed by 1 or more digits or symbols like - _ / ; \ # etc"
 
Upvote 0
Thanks a lot for the code Peter, I just wanted to ask if you can please modify the code for "any amount of upper or lower case letters followed by 1 or more digits or symbols like - _ / ; \ # etc"

Also for each number found and filtered it will be shown in individual cell, however is there a chance all those numbers can be shown in one cell at once with space between them? Only if that is possible. Thank you very much
Then Post #11 code results are:

[TABLE="width: 828"]
<tbody>[TR]
[TD="align: left"]TK8881122, whatever text NM12345, agin so an do Yn22445672, CCCCCCCCC DFd-47483732, hkjhkjhj DBR#31221, xxxxxxxxxxxxxxxxxxxxx AD/132345,
TDVG-342432, zf zdf gffgj QA_122123, hgfjgj CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123[/TD]
[TD][/TD]
[TD="align: left"]TK8881122,
NM12345,
Yn22445672,
DFd-47483732,
DBR#31221,
AD/132345,
TDVG-342432,
QA_122123,
hgfjgj 123456
CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Hi, Mohadin,

with Peter's formula and your code I only get this output in cells:

#NAME ?


Also: abc123WE3 is not a valid number, it will not be used in the text :)

You haven't given any examples or explained if anything is NOT allowed to follow a group of letters.

Everything is allowed to follow the group of letters it can be some characters like: #-_/\ as listed in the numbers below.

Peters formula works fine, I just forgot to mention in the first post that a number can be not just letters and numbers it can also contain some characters in between like MHD_213123

These are the numbers which can be found in the text:
TK8881122,
NM12345,
Yn22445672,
DFd-47483732,
DBR#31221,
AD/132345,
TDVG-342432,
QA_122123,
CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123

I re did the text:
In literary theory, a text is any object that can be "read", whether this object is a work of literature, TK8881122 a street sign, an arrangement of buildings Yn22445672, on a city block, or styles of clothing DFd-47483732.
It is a DBR#31221, coherent set of signs that AD/132345, transmits some kind of informative message.[1]
This set of signs is considered in terms of the informative message's content, rather than in terms TDVG-342432 of its QA_122123 physical CDSDF#0212121 form or the medium po#312312 in which it is represented AWW;1232123.
Within the field of literary criticism, "text" SDB\1232133 also refers to the original information content of a NM12345 particular piece of writing; that is, the "text" of a work is that primal symbolic arrangement of letters as originally composed, apart from later alterations, deterioration, commentary, translations, paratext, etc.


Thanks all

Then Post #11 code results are:

[TABLE="width: 828"]
<tbody>[TR]
[TD="align: left"]TK8881122, whatever text NM12345, agin so an do Yn22445672, CCCCCCCCC DFd-47483732, hkjhkjhj DBR#31221, xxxxxxxxxxxxxxxxxxxxx AD/132345,
TDVG-342432, zf zdf gffgj QA_122123, hgfjgj CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123[/TD]
[TD][/TD]
[TD="align: left"]TK8881122,
NM12345,
Yn22445672,
DFd-47483732,
DBR#31221,
AD/132345,
TDVG-342432,
QA_122123,
hgfjgj 123456
CDSDF#0212121,
po#312312,
SDB\1232133,
AWW;1232123[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Everything is allowed to follow the group of letters ..
Can it be a space? That is, from
"one two 34 three"
should "two 34" be extracted as it fits the rule if everything is allowed after the group of letters?
For the moment I have assumed that a space is not included in the list of allowed characters after the group of letters. Code can be adjusted if it is allowed.

I have interpreted the requirement now as
"One or more letters followed by one or more digits and there may or may not be a single other character between the letters and numbers but that character cannot be a space"

Further, in post 7 you asked if all the items could be returned with a space between but all your examples seem to now have a comma and a new line for each item, The function below does the comma & new line. If do want a space instead then just remove all the blue text from the code.

Rich (BB code):
Function GetParts(s As String) As String
  With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "([A-Za-z]+[^ \d]?\d+)"
    If .test(s) Then GetParts = Join(Filter(Split(.Replace(s, "zzzqqq$1qqqzzz"), "qqq"), "zzz", False), "," & vbLf)
  End With
End Function

A few samples to show the results of my function. Please provide clarification if these are not what you want or if the function does not give the expected results for any other data samples.

Excel Workbook
AB
1In literary theory, a text is any object that can be "read", whether this object is a work of literature, TK8881122 a street sign, an arrangement of buildings on a city block, or styles of clothing.It is a coherent set of signs that transmits some kind of informative message.[1]This set of signs is considered in terms of the informative message's content, rather than in terms of its physical form or the medium in which it is represented.Within the field of literary criticism, "text" also refers to the original information content of a NM12345 particular piece of writing; that is, the "text" of a work is that primal symbolic arrangement of letters as originally composed, apart from later alterations, deterioration, commentary, translations, paratext, etc.TK8881122,NM12345
2Nothing here meets the pattern
3
4text HGFTY76509876 more text F0, and more G876567404323.HGFTY76509876,F0,G876567404323
5one two 34 three
6In literary theory, a text is any object that can be "read", whether this object is a work of literature, TK8881122 a street sign, an arrangement of buildings Yn22445672, on a city block, or styles of clothing DFd-47483732.It is a DBR#31221, coherent set of signs that AD/132345, transmits some kind of informative message.[1]This set of signs is considered in terms of the informative message's content, rather than in terms TDVG-342432 of its QA_122123 physical CDSDF#0212121 form or the medium po#312312 in which it is represented AWW;1232123.Within the field of literary criticism, "text" SDB\1232133 also refers to the original information content of a NM12345 particular piece of writing; that is, the "text" of a work is that primal symbolic arrangement of letters as originally composed, apart from later alterations, deterioration, commentary, translations, paratext, etc.TK8881122,Yn22445672,DFd-47483732,DBR#31221,AD/132345,TDVG-342432,QA_122123,CDSDF#0212121,po#312312,AWW;1232123,SDB\1232133,NM12345
Sheet1
 
Upvote 0

Forum statistics

Threads
1,223,888
Messages
6,175,219
Members
452,620
Latest member
dsubash

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top