Extracting text within varying length phrases

Stu2407

New Member
Joined
Oct 3, 2023
Messages
4
Office Version
  1. 365
Platform
  1. Windows
I have a massive file (53,000+ rows) where I need to extract a “Record ID#” value that is contained within text. The attached screenshot shows the various ways the Record ID# value shows up in the Comments

End goal is column B in attached screenshot.

I can NOT use macros as I need to show/teach this using formulas or power query to others who need duplicate this in various ways to deal with similar data cleanup situations.
  • The only common delimiter is “2023/Archive/” everything else that follows ranges in length.
  • The 10 character Record ID# always starts with a capital “D” followed by 9 digits OR a capital “D” followed by another capital letter then 8 digits.
  • Most of the cells contain only 1 Record ID# in the Comments field.
  • Some cells (A8 & A15) where Record ID# values are a range, the delimiter between the Record ID# range is Space/Dash/Space.
I tried the following but it failed--
  • Find & Replace “2023/Archive/” to reduce clutter
  • Then Text to Column’ed which gave me upto 5 columns of data for each row depending on length of comment.
  • The Record ID# value could be in Column 1,2,3,4 or 5. I’d sort each column & delete other “junk” values… very time consuming.
  • Then I didn’t know how to “bring over” that Record ID# value in either Column 1,2,3,4 or 5 into a new Record ID# column.
  • The other issue with Text to Column’ing are Comments value (see A8-A15) where the Record ID# is within a phrase and/or the Record ID# is listed as a range instead of a single value.
I also tried to using SEARCH & RIGHT with “D” as trigger… nested formula was kicking my butt. Thought was I’d search for “D” in each cell and then give me 23 characters to the right. 23 refers to the characters when Record ID# is a range (see A8 & A15) & not the commonly occurring 9 digit value. Thought was I get a column with Record ID# followed by junk characters OR the 2nd half of a Record ID# range value. Then figure out how to delete the junk characters

I couldn’t think of how to Power Query to solve this either. I essentially did the same as above- split values by “/”. But still had the issues mentioned above to proceed with further cleanup to isolate the Record ID#.

Thanks in advance for any help you can provide.
 

Attachments

  • Screenshot-Text Extract .png
    Screenshot-Text Extract .png
    161.7 KB · Views: 35

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Please provide some sample data and expected results.

MrExcel has a tool called “XL2BB” that lets you post samples of your data, allowing us to copy/paste it to our Excel spreadsheets to work with the same data you are. Instructions on using this tool can be found here: XL2BB Add-in

Note that this board also has a "Test Here” forum. This is a place where you can test using this tool (or any other posting techniques that you want to test) before trying to use those tools in your actual posts.

Thanks,

Doug
 
Upvote 0
Welcome to the MrExcel board!

Certainly sample data with XL2BB would help in the future, however see if this helps. Not sure how robust this will be - depends on the (in)consistency of your data.

Stu2407.xlsm
AB
1DataResult
22023/Archive/Archive/Area-01/Group-01/D123456789D123456789
32023/Archive/Area-01/Grou -02/D023456790D023456790
42023/Archive/D123456791D123456791
52023/Archive/Area-04/D123456792D123456792
62023/Archive/DR23456793DR23456793
72023/Archive/Area-014/D923456794D923456794
82023/Archive/Area-031/D123456795 - D187156795D123456795 - D187156795
92023/Archive/New ref# L26199A-01 (old ref# DM23456796)DM23456796
102023/Archive/Area-01/Group-01/D123456797D123456797
112023/Archive/Topic A/ This is a sample comment for D333456796D333456796
122023/Archive/Topic B/Topic C/Variation for DQ33456325 sample commentDQ33456325
132023/Archive/Topic D/ Sample data comment for old record id# D221100987 referencedD221100987
142023/Archive/Samp1e comment for Mike DeWitterman with no record id# referenced 
152023/Archive/Another sample comment for D923555680 - D923555713 contained within a large string of textD923555680 - D923555713
Sheet1
Cell Formulas
RangeFormula
B2:B15B2=LET(id,"D?11111111",c,MID(A2,SEQUENCE(LEN(A2)),1),s,CONCAT(IF(ISNUMBER(c+0),1,c)),p,SEARCH(id,s),IFERROR(MID(A2,p,IF(ISNUMBER(SEARCH(id,s,p+9)),23,10)),""))
 
Upvote 0
Hi Peter,

In your formula, I like the idea of converting all the numbers "1" so pattern matching is easier.

Doug
 
Upvote 0
Welcome to the MrExcel board!

Certainly sample data with XL2BB would help in the future, however see if this helps. Not sure how robust this will be - depends on the (in)consistency of your data.

Stu2407.xlsm
AB
1DataResult
22023/Archive/Archive/Area-01/Group-01/D123456789D123456789
32023/Archive/Area-01/Grou -02/D023456790D023456790
42023/Archive/D123456791D123456791
52023/Archive/Area-04/D123456792D123456792
62023/Archive/DR23456793DR23456793
72023/Archive/Area-014/D923456794D923456794
82023/Archive/Area-031/D123456795 - D187156795D123456795 - D187156795
92023/Archive/New ref# L26199A-01 (old ref# DM23456796)DM23456796
102023/Archive/Area-01/Group-01/D123456797D123456797
112023/Archive/Topic A/ This is a sample comment for D333456796D333456796
122023/Archive/Topic B/Topic C/Variation for DQ33456325 sample commentDQ33456325
132023/Archive/Topic D/ Sample data comment for old record id# D221100987 referencedD221100987
142023/Archive/Samp1e comment for Mike DeWitterman with no record id# referenced 
152023/Archive/Another sample comment for D923555680 - D923555713 contained within a large string of textD923555680 - D923555713
Sheet1
Cell Formulas
RangeFormula
B2:B15B2=LET(id,"D?11111111",c,MID(A2,SEQUENCE(LEN(A2)),1),s,CONCAT(IF(ISNUMBER(c+0),1,c)),p,SEARCH(id,s),IFERROR(MID(A2,p,IF(ISNUMBER(SEARCH(id,s,p+9)),23,10)),""))
Thanks Doug. I will install the add-on & get back to you. I'm sure I'll have some follow up how to questions.
 
Upvote 0
Hi Peter,

Could you give me the formula for just extracting the single value of the Record ID# & exclude the range values? I’ve never used some the functions in your formula so I’m trying to break it down to follow along… kind of getting lost in the nested functions. Formula for extracting the single value of the Record ID may be easier to break down & learn.
 
Upvote 0
I had already done a partial breakdown of Peter's LET formula...I added a few columns to try to make it clearer.
Book_2023-10-03.xlsm
ABCDEFGHI
1DataResultidcs_partAspfinal_partAfinal
22023/Archive/Archive/Area-01/Group-01/D123456789/abcD123456789D?111111112TRUE1111/Archive/Archive/Area-11/Group-11/D111111111/abc399D123456789
30TRUE
42TRUE
53TRUE
6/FALSE
7AFALSE
8rFALSE
9cFALSE
10hFALSE
11iFALSE
12vFALSE
13eFALSE
14/FALSE
15AFALSE
16rFALSE
17cFALSE
18hFALSE
19iFALSE
20vFALSE
21eFALSE
22/FALSE
23AFALSE
24rFALSE
25eFALSE
26aFALSE
27-FALSE
280TRUE
291TRUE
30/FALSE
31GFALSE
32rFALSE
33oFALSE
34uFALSE
35pFALSE
36-FALSE
370TRUE
381TRUE
39/FALSE
40DFALSE
411TRUE
422TRUE
433TRUE
444TRUE
455TRUE
466TRUE
477TRUE
488TRUE
499TRUE
50/FALSE
51aFALSE
52bFALSE
53cFALSE
Sheet5
Cell Formulas
RangeFormula
B2B2=LET(id,"D?11111111",c,MID(A2,SEQUENCE(LEN(A2)),1),s,CONCAT(IF(ISNUMBER(c+0),1,c)),p,SEARCH(id,s),IFERROR(MID(A2,p,IF(ISNUMBER(SEARCH(id,s,p+9)),23,10)),""))
D2:D53D2=MID(A2,SEQUENCE(LEN(A2)),1)
E2:E53E2=ISNUMBER(D2#+0)
F2F2=CONCAT(IF(ISNUMBER(D2#*1),1,D2#))
G2G2=SEARCH("D?11111111",F2,1)
H2H2=IF(ISNUMBER(SEARCH("D?11111111",F2,G2+9)),23,9)
I2I2=IFERROR(MID(A2,G2,IF(ISNUMBER(SEARCH(C2,F2,G2+9)),23,10)),"")
Dynamic array formulas.

Hope that helps,

Doug
 
Upvote 0
Could you give me the formula for just extracting the single value of the Record ID# & exclude the range values?
Not certain what you mean. This extracts the first id (if there is one).
If you want to actually exclude the rows with an id range, then the formula may actually be more complicated than the original post #3 formula & therefore defeat the idea of simplifying to break down & understand.

Stu2407.xlsm
AB
1DataResult
22023/Archive/Archive/Area-01/Group-01/D123456789D123456789
32023/Archive/Area-01/Grou -02/D023456790D023456790
42023/Archive/D123456791D123456791
52023/Archive/Area-04/D123456792D123456792
62023/Archive/DR23456793DR23456793
72023/Archive/Area-014/D923456794D923456794
82023/Archive/Area-031/D123456795 - D187156795D123456795
92023/Archive/New ref# L26199A-01 (old ref# DM23456796)DM23456796
102023/Archive/Area-01/Group-01/D123456797D123456797
112023/Archive/Topic A/ This is a sample comment for D333456796D333456796
122023/Archive/Topic B/Topic C/Variation for DQ33456325 sample commentDQ33456325
132023/Archive/Topic D/ Sample data comment for old record id# D221100987 referencedD221100987
142023/Archive/Samp1e comment for Mike DeWitterman with no record id# referenced 
152023/Archive/Another sample comment for D923555680 - D923555713 contained within a large string of textD923555680
Sheet3
Cell Formulas
RangeFormula
B2:B15B2=LET(c,MID(A2,SEQUENCE(LEN(A2)),1),IFERROR(MID(A2,SEARCH("D?11111111",CONCAT(IF(ISNUMBER(c+0),1,c))),10),""))
 
Upvote 0

Forum statistics

Threads
1,224,820
Messages
6,181,161
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top