Extract Text from Middle of Alphanumeric String Up to a Number

croiss

New Member
Joined
Oct 7, 2014
Messages
8
Hi guys, I'm having trouble figuring out how to complete this data cleanup:

I have a list of items that all follow the same format:

123456789012 Abc company 5 10-04-2013 bcdefgh company y p

The items were copied and pasted from a PDF file, hence the messiness, and the misalignment of the columns... They all pasted over into one column, and I was able to start extracting text.
Some notes about the data:

1. They all start with a 12-character code that is followed by a space before the company title.
2. The company titles are all different lengths, which is why delimiting would not work.
3. Each company name is followed by a number, and then a date, and then another description...

My question is, how can I pull the "ABC company" part out the string without pulling the rest of the string? In other words, I want to pull the text UP TO a number. So in this case, "ABC COMPANY" up to 5.

So far, I have been played around with mid, len, find, search... But haven't been able to figure it out. I've tried also basing it off the =left(a2,min(find...) but couldn't get it to work within the MID function.

Thank you!
 
this should work if max is 2 digits

Code:
=TRIM(MID(A1,14,SEARCH("?? ??-??-????",A1)-14))
It fails with this one...

555666777888* La-Di-Dahs 3 11-20-2013* BLAH BLAH P

but that is so contrived that I would be willing to overlook it (not sure how anyone else feels though). Other than that type of construction (where the date pattern exists in text before the date), your function seems very stable.
 
Upvote 0

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
There is one I could think of... when the name contains more than one number

e.g.

555666777888 9 to 5 Diners 2 11-20-2013 BLAH BLAH P

btw, 9 to 5 Diners is a local fast food restaurant's name :)

Excellent... this formula works with everything I could think of to throw at it... well done.
 
Last edited:
Upvote 0
Not if the company name contains more than one numeric

e.g.

555666777888 3M in 6W Corp. 2 11-20-2013 BLAH BLAH P
555666777888 9 to 5 Diners 2 11-20-2013 BLAH BLAH P
Good catch (I cannot believe I did not test for that one).
 
Upvote 0
Thank you all! Your responses have been much appreciated and worked great. ... Now, how can I easily extract the numbers after and the date?
 
Upvote 0
This formula (slight modification of VBA Geek’s formula in post #9) seems to treat correctly each sample mentioned so far (should be confirmed by Ctrl-Shift-Enter):


=TRIM(MID(A1,14,MATCH(TRUE,ISNUMBER(DATEVALUE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),10))),0)-16))
 
Upvote 0
This formula (slight modification of VBA Geek’s formula in post #9) seems to treat correctly each sample mentioned so far (should be confirmed by Ctrl-Shift-Enter):

=TRIM(MID(A1,14,MATCH(TRUE,ISNUMBER(DATEVALUE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),10))),0)-16))
Very good, but you need to change the red highlighted 16 to 17, then I would agree with you (with 16, the first digit of a two-digit number following the company name is retrieved... 17 cuts it out and still makes it work with single digit numbers following the company name).
 
Upvote 0
Working with VBA Geek's formula from Message #9, this variation of it also seems to work with all the variations presented so far (half the function calls of István's latest formula and it is also non-Volatile, although I think István's formula can be made non-Volatile using a properly contructed INDEX function call in place of the INDIRECT one)...

=TRIM(MID(A1,14,SUMPRODUCT(IFERROR(SEARCH({" ? ??-??-????"," ?? ??-??-????"},A1),0))-14))
 
Last edited:
Upvote 0
Working with VBA Geek's formula from Message #9, this variation of it also seems to work with all the variations presented so far (half the function calls of István's latest formula and it is also non-Volatile, although I think István's formula can be made non-Volatile using a properly contructed INDEX function call in place of the INDIRECT one)...

=TRIM(MID(A1,14,SUMPRODUCT(IFERROR(SEARCH({" ? ??-??-????"," ?? ??-??-????"},A1),0))-14))
We can actually shorten this formula by six characters by reworking the constant array like so...

=TRIM(MID(A1,14,SUMPRODUCT(IFERROR(SEARCH(" "&{"?","??"}&" ??-??-????",A1),0))-14))
 
Upvote 0
They all work well for the company list... Thank you all so much! Now, how can I modify it to pick up the digit after for the next column, and then the date? The bold and underlined portions of the examples:


555666777888 3M in 6W Corp. 2 11-20-2013 BLAH BLAH P
555666777888 9 to 5 Diners 2 11-20-2013 BLAH BLAH P
 
Upvote 0

Forum statistics

Threads
1,223,888
Messages
6,175,219
Members
452,620
Latest member
dsubash

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top