Formula for extracting multiple occurrences of text substring in a cell

EmLearnsMicrosoft

New Member
Joined
Jun 28, 2024
Messages
11
Office Version
  1. 365
Platform
  1. MacOS
Hi! I am trying to write a formula that will allow me to extract multiple strings of numbers from one cell to another, while filtering out all other irrelevant text from the source cell. For example (see attached photo), I want to be able to extract multiple 8-digit numbers, separated into the output column by "; ", that each occur after the text "ABCD: " within a cell. I have tried multiple complex nesting functions, based off of suggestions on other similar issues, but cannot seem to tailor the functions to fit my exact needs. Any help would be appreciated!!! (I use Excel for Mac, Version 16.84 (24041420))

Book2.xlsx
AB
1Data Column 1ABCD values extracted
2Apple 2021 (ABCD: 12345678)12345678
3Boater 2021 (ABCD: 23456789), Crayon 2017 (ABCD: 34567890), Dorito 2010 (ABCD: 45678901), Developer Website: Grannies Cookies23456789; 34567890; 45678901
4Moose: Study immunity and effectiveness of Sample A (Apple 2021)
5Moose: evaluate immunogenicity (Elbows 2021 (ABCD: 56789012)) Moose: evaluate infectivity, safety, immunogenicity and protective efficacy (Freefall 2021 (PMID: 67890123))56789012; 67890123
Sheet1
 

Attachments

  • Screenshot 2024-06-28 at 3.43.59 PM.png
    Screenshot 2024-06-28 at 3.43.59 PM.png
    80.3 KB · Views: 18

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
Try
Book1
AB
1Data Column 1
2Apple 2021 (ABCD: 12345678)12345678
3Boater 2021 (ABCD: 23456789), Crayon 2017 (ABCD: 34567890), Dorito 2010 (ABCD: 45678901), Developer Website: Grannies Cookies23456789; 34567890; 45678901
4Moose: Study immunity and effectiveness of Sample A (Apple 2021) 
5Moose: evaluate immunogenicity (Elbows 2021 (ABCD: 56789012)) Moose: evaluate infectivity, safety, immunogenicity and protective efficacy (Freefall 2021 (PMID: 67890123))56789012; 67890123
Sheet7
Cell Formulas
RangeFormula
B2:B5B2=LET(t,TEXTSPLIT(A2,,{" ",")"}),TEXTJOIN("; ",,IF(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--DROP(t,1)),DROP(t,1),"")))
 
Upvote 0
THANK YOU SO MUCH! As a quick follow-up, would there be a way to 1. apply this formula to an entire spreadsheet with relevant data across many columns and rows, and have each unique value be extracted into a single cell? e.g., first occurrence in the sheet appear in cell AD2, the second occurrence in AD3, etc.?
 
Upvote 0
Do you mean like this? If so note that it has a limit of ~32,000 characters limits. Not sure how much data you're talking about. For large data set, I'd go with VBA as it's probably more efficient.
Book1
ABCD
1Data Column 1
2Apple 2021 (ABCD: 12345678)Moose: Study immunity and effectiveness of Sample A (Apple 2021)12345678
3Boater 2021 (ABCD: 23456789), Crayon 2017 (ABCD: 34567890), Dorito 2010 (ABCD: 45678901), Developer Website: Grannies CookiesMoose: evaluate immunogenicity (Elbows 2021 (ABCD: 56789012)) Moose: evaluate infectivity, safety, immunogenicity and protective efficacy (Freefall 2021 (PMID: 67890123))23456789
434567890
545678901
656789012
767890123
Sheet7
Cell Formulas
RangeFormula
D2:D7D2=UNIQUE( TEXTSPLIT(TEXTJOIN(";",,MAP(A2:B3,LAMBDA(m, LET(t,TEXTSPLIT(m,,{" ",")"}),TEXTJOIN(";",,IF(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--DROP(t,1)),DROP(t,1),"")))))),,";"))
Dynamic array formulas.
 
Last edited:
Upvote 0
Do you mean like this? If so note that it has a limit of ~32,000 characters limits. Not sure how much data you're talking about. For large data set, I'd go with VBA as it's probably more efficient.
Book1
ABCD
1Data Column 1
2Apple 2021 (ABCD: 12345678)Moose: Study immunity and effectiveness of Sample A (Apple 2021)12345678
3Boater 2021 (ABCD: 23456789), Crayon 2017 (ABCD: 34567890), Dorito 2010 (ABCD: 45678901), Developer Website: Grannies CookiesMoose: evaluate immunogenicity (Elbows 2021 (ABCD: 56789012)) Moose: evaluate infectivity, safety, immunogenicity and protective efficacy (Freefall 2021 (PMID: 67890123))23456789
434567890
545678901
656789012
767890123
Sheet7
Cell Formulas
RangeFormula
D2:D7D2=UNIQUE( TEXTSPLIT(TEXTJOIN(";",,MAP(A2:B3,LAMBDA(m, LET(t,TEXTSPLIT(m,,{" ",")"}),TEXTJOIN(";",,IF(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--DROP(t,1)),DROP(t,1),"")))))),,";"))
Dynamic array formulas.
Thank you for this! This is exactly the type of output that I wanted. There is one *small* issue with the formula somewhere though, specifically with the filtering criteria I believe. The output on my actual dataset gave me 553/558 rows of the data that I wanted. The other 5 rows extracted data that I am not interested in, that must've met the filtering or extraction criteria provided in the formula.

All of the data that I want to be extracted are 8-digit integers that specifically follow "ABCD: ". For example, "(ABCD: 12345678)", as shown in my original post. With this formula, the following integers were also extracted: 01, 2023, 3, 2, and 2015. I am unsure of what the source formatting for those integers were (it is a large dataset and merely searching for those terms comes up with way too many instances to try and locate the actual source).

Is there a way to further specify the search/filtering criteria to include "ABCD: ", but still only extract the 8-digit integer that follows, as the only time that "ABCD: " appears in the dataset is right before the data of interest?

Thanks again!!!
 
Upvote 0
Do you mean like this? If so note that it has a limit of ~32,000 characters limits. Not sure how much data you're talking about. For large data set, I'd go with VBA as it's probably more efficient.
Book1
ABCD
1Data Column 1
2Apple 2021 (ABCD: 12345678)Moose: Study immunity and effectiveness of Sample A (Apple 2021)12345678
3Boater 2021 (ABCD: 23456789), Crayon 2017 (ABCD: 34567890), Dorito 2010 (ABCD: 45678901), Developer Website: Grannies CookiesMoose: evaluate immunogenicity (Elbows 2021 (ABCD: 56789012)) Moose: evaluate infectivity, safety, immunogenicity and protective efficacy (Freefall 2021 (PMID: 67890123))23456789
434567890
545678901
656789012
767890123
Sheet7
Cell Formulas
RangeFormula
D2:D7D2=UNIQUE( TEXTSPLIT(TEXTJOIN(";",,MAP(A2:B3,LAMBDA(m, LET(t,TEXTSPLIT(m,,{" ",")"}),TEXTJOIN(";",,IF(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--DROP(t,1)),DROP(t,1),"")))))),,";"))
Dynamic array formulas.
Additional question, is the ~32,000 character limit for the data range or for the output?
 
Upvote 0
Is there a way to further specify the search/filtering criteria to include "ABCD: ", but still only extract the 8-digit integer that follows, as the only time that "ABCD: " appears in the dataset is right before the data of interest?
There must been some similar instances, but this will check if the numeric string is 8 characters long.
Excel Formula:
=UNIQUE(
   TEXTSPLIT(TEXTJOIN(";",,MAP(A2:B3,LAMBDA(m,
      LET(t,TEXTSPLIT(m,,{" ",")"},1),d,DROP(t,1),TEXTJOIN(";",,IF(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--d)*(LEN(d)=8),d,"")))))),,";"))


Additional question, is the ~32,000 character limit for the data range or for the output?
Data range.
 
Upvote 1
Solution
Try this option as well. This doesn't rely on TEXTJOIN so it won't have a limit.
Excel Formula:
=UNIQUE(TOCOL(
      DROP(
           REDUCE("",TOCOL(A2:B3,1),LAMBDA(a,m,VSTACK(a,LET(t,TEXTSPLIT(m,,{" ",")"},1),d,DROP(t,1),IFS(ISNUMBER(FIND(":",DROP(t,-1)))*ISNUMBER(--d)*(LEN(d)=8),d))))),
      1),
2))
 
Upvote 1
Using new regex formulas:
Excel Formula:
=TEXTJOIN("; ";TRUE;IFERROR(REGEXEXTRACT(A2;"(?<=ABCD:\s)[0-9]+";1);""))
 
Upvote 0
Using new regex formulas:
Excel Formula:
=TEXTJOIN("; ";TRUE;IFERROR(REGEXEXTRACT(A2;"(?<=ABCD:\s)[0-9]+";1);""))
Wow! This looks so much easier! Unfortunately it seems as though I do not yet have access to these functions (I updated the software today and tried).
 
Upvote 0

Forum statistics

Threads
1,220,965
Messages
6,157,119
Members
451,398
Latest member
rjsteward

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top