Extract a 5 Digit Number From Inconsistent Text Strings

jeffgibson55

New Member
Joined
Aug 22, 2011
Messages
17
I know how to extract a number from a text string using:

LOOKUP(99^99,--("0"&MID(J1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},J1&"0123456789")),ROW($1:$10000))))

But in my current data set I have thousands of inconsistent text strings some of which contain other numbers that I don't care about - with the above formula it just grabs the first number it finds left to right. The number I need to grab is always a 5 digit number. Is there any formula I can use that would only grab out a 5 digit number ignoring all other numbers? Here's an example text string: "9/14/2010.NTL.TeleBroc.55129T_V1_N" where 55129 is the desired extract value but like I said the other text strings don't necessarily follow this example's format/delimiters.

Thanks!
Jeff
 
Sorry, just noticed that the last part of the post did not print OK. Should be:

Remark:
As you may know vba has an old regex engine that does not recognize lookbehinds.
With a more modern version of regex you could use the simpler

Code:
.Pattern = "(?<!\d)(\d{5})(?!\d)"
 
Upvote 0

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
So you don't need a \d after the carrot? Is the engine associating the \d from the second group with the carrot from the first group?
 
Upvote 0
So you don't need a \d after the carrot? Is the engine associating the \d from the second group with the carrot from the first group?

Sorry, I don't understand.

(?:^|\D)(\d{5})(?!\d)

The pattern has 3 parts

Part 1:

(?:^|\D)

means: not preceded by a digit

Part 2:

(\d{5})

5 consecutive digits

Part 3:

(?!\d)

not followed by a digit.

Reading the whole pattern:

(?:^|\D)(\d{5})(?!\d)

5 consecutive digits not preceded and not followed by a digit.

does this clarify?
 
Last edited:
Upvote 0
I understand the last two groups of the pattern. In the non capture group you use \D but is that not a non digit? Yet you refer to this as a digit. Wouldn't you want to use \d instead? That's why I don't understand how the engine knows not to precede the 5 digits with a digit. I'm new to regex so I'm missing something very obvious apparently.
 
Last edited:
Upvote 0
I understand the last two groups of the pattern.

OK. Let's just talk about the first group.

(?:^|\D)

means either at the beginning of the string OR a non-digit.

If you add the second group

(?:^|\D)(\d{5})

you now have 5 consecutive digits that are

Either at the beginning of the string

OR are preceded by a non-digit

This means that the 5 consecutive digits are not preceded by a digit.

Is this clearer?

Ex.

12345abc

we get 12345, because it's at the beginning of the string and is not followed by a digit

ab34567c

we get 34567 because these 5 consecutive digits are preceded by a non-digit and not followed by a digit

ab456789c

we don't have a match. there are no 5 consecutive digits that are not preceded or followed by a digit.

HTH
 
Upvote 0
Here is another one

=-LOOKUP(1,-MID(SUBSTITUTE(" "&A1," ",""),ROW(INDIRECT("1:"&LEN(" "&A1)-5)),5))


Book1
AB
110/12/2010.IBN.NTL.WebcastEBrochure.54695LVWCR_V254695
210/13/2010.IBN.NTL.WebcastEBrochure.54713LVWCR_V254713
34/20/2010.IBN.NTL.WebcastEBrochure.52511LVWCR_V152511
41/19/2010.IBN.NTL.WebcastEBrochure.52509LVWCR_V152509
557639ER_Drop1-110808-133557639
657818ER_Drop1-110803-064657818
Sheet1
 
Upvote 0
Hi AlKey

Good formula, it will work for most of the strings, will however fail sometimes.

Try, for ex. with these strings:

abc123 45X
abc 1 2 3 4 5 X
abc+2346efg
abc123e4X
abc12e-4X
abc12-12d


None of them has 5 consecutive digits.
 
Upvote 0
I suppose, any formula can fail if one would try enough:)
It is the "e" that causes the problem.

=-LOOKUP(1,-LEFT(SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&1/17))-1,""),"e"," "),"-"," "),{1,2,3,4,5}))


Book1
AB
1abc123 45X123
2abc 1 2 3 4 5 X1
3abc+2346efg2346
4abc123e4X123
5abc12e-4X12
6abc12-12d12
Sheet1
 
Upvote 0

Forum statistics

Threads
1,223,898
Messages
6,175,274
Members
452,628
Latest member
dd2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top