Extract a 5 Digit Number From Inconsistent Text Strings

jeffgibson55

New Member
Joined
Aug 22, 2011
Messages
17
I know how to extract a number from a text string using:

LOOKUP(99^99,--("0"&MID(J1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},J1&"0123456789")),ROW($1:$10000))))

But in my current data set I have thousands of inconsistent text strings some of which contain other numbers that I don't care about - with the above formula it just grabs the first number it finds left to right. The number I need to grab is always a 5 digit number. Is there any formula I can use that would only grab out a 5 digit number ignoring all other numbers? Here's an example text string: "9/14/2010.NTL.TeleBroc.55129T_V1_N" where 55129 is the desired extract value but like I said the other text strings don't necessarily follow this example's format/delimiters.

Thanks!
Jeff
 

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
Hi

Try:

=ExtractCode(A1)

using this udf:

Code:
Function ExtractCode(s As String) As String

With CreateObject("VBScript.RegExp")
    .Pattern = ".*? (\d{2,7}-\d\d-\d(?=(,| |$)))|.*"
    .Global = True
    ExtractCode = Trim(.Replace(s, " $1"))
End With
End Function
 
Upvote 0
Hi

Try:

=ExtractCode(A1)

using this udf:

Code:
Function ExtractCode(s As String) As String

With CreateObject("VBScript.RegExp")
    .Pattern = ".*? (\d{2,7}-\d\d-\d(?=(,| |$)))|.*"
    .Global = True
    ExtractCode = Trim(.Replace(s, " $1"))
End With
End Function

My excel does not extract the substring to be extracted if it is in front position.
 
Upvote 0
Hi István

You are right, but madhav said

.. extract only digit strings between space and (space or comma)..

and so I expected a space before the target, if the target starts at the first character it will not be matched.

But I think you are right, better to also consider the case where a code starts at character 1:

Code:
    .Pattern = "(?:^|.*? )(\d{2,7}-\d\d-\d(?=(,| |$)))|.*"
 
Last edited:
Upvote 0
For those who might be interested, here is how to write the UDF without using Regular Expressions...
Code:
Function ExtractCode(S As String) As String
  Dim X As Long, Parts() As String
  Parts = Split(Replace(S, ",", " "))
  For X = 0 To UBound(Parts)
    If Parts(X) Like "*[!0-9-]*" Then Parts(X) = ""
  Next
  ExtractCode = Application.Trim(Join(Parts))
End Function
 
Last edited:
Upvote 0
... Not true for the OP's last substring in post #30.

You are right. Should maybe be something like

..extract only digit strings between space and (space or comma) or at the beginning or at the end of the string...

Rick

I tried with

ab --5-- cdefgh 1-2-3-4, 111-222

and got

--5-- 1-2-3-4 111-222

None of these codes respects the specifications.
 
Upvote 0
Rick

I tried with

ab --5-- cdefgh 1-2-3-4, 111-222

and got

--5-- 1-2-3-4 111-222

None of these codes respects the specifications.
My code failed because I did not read the specifications fully; however, this revised non-RegExp UDF should work correctly (assuming I interpreted the specifications correctly)...
Code:
[table="width: 500"]
[tr]
	[td]Function ExtractCode(S As String) As String
  Dim X As Long, V As String, Parts() As String
  Parts = Split(Application.Trim(Replace(S, ",", " ")))
  For X = 0 To UBound(Parts)
    V = Parts(X)
    If Len(V) < 7 Or Len(V) > 12 Or V Like "*[!0-9-]*" Or Not V Like String(Application.Max(0, Len(V) - 5), "#") & "-##-#" Then Parts(X) = ""
  Next
  ExtractCode = Application.Trim(Join(Parts))
End Function[/td]
[/tr]
[/table]
 
Upvote 0
Glad to see great minds helping me...

probably you guys are close but still the codes have not given what is expected..

Please use my example to try.. (red color below is just for ease of understanding)

32-12-3 1-anfkjdn-2,3-nngnkfngnfd ngfndnfg 454532-34-5 mngfdsngf 7674-46-9, nfkjndsgkfdnm,nkjfdf 768-54-1 nfngfdngfn 96-78-1, 154222-98-3

I would like extract only the text-independent number strings 32-12-3 454532-34-5 7674-46-9 768-54-1 96-78-1 154222-98-3.
Please ignore all the strings where text and digits are mixed (example: 1-anfkjdn-2,3-nngnkfngnfd)..
extract only digit strings with patterns: #######-##-# or ######-##-# or #####-##-# or ####-##-# or ###-##-# or ###-##-#
Please note last "-##-#" is fixed only digit in the 1st set is varying from 2 to 7.
Thank you for helping me out..
 
Upvote 0

Forum statistics

Threads
1,224,824
Messages
6,181,186
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top