Extracting numbers from text string and including units (mm)

alpha_pinene

New Member
Joined
Jan 27, 2022
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi all!
I am working on a project in which I am pulling data from pathology reports regarding skin cancer. The text reports are quite lengthy and contain many numbers (i.e., dates, cancer stage, and the number I am interested in, which is called Breslow thickness). I have managed to use formulas to condense the text into a small segment surrounding my word of interest ("Breslow") with 12 characters before the word Breslow and 75 characters after the word. ($S$1 corresponds to a cell with the word "Breslow.")
1643324166113.png


The string of 1's is because I could only figure out how to get characters to the LEFT of "Breslow" to show up by adding more numbers in that set of parentheses (which there is probably a better way and I would love ot hear it if so).

Then, I was able to get the number to the front of another column using the below formula:

1643324305836.png


But herein lies my problem. Sometimes, my number of interest (which typically ranges from 0.1-4 mm) isn't the first number in the string of text, and I will capture a date or a time (like 9/17 or 4:00:00AM), which is problematic, because I need to average all of the Breslow thicknesses and I'm getting a significant number of cells that don't capture the right number.

I thought if I could develop a formula that would search for a number PLUS the units (like 0.4mm or 0.4 mm, though there are variations in which cells do and don't have spaces which could complicate things), then I could be able to avoid getting the dates/times.

I am at my wit's end and would really appreciate any advice that can be offered.

Thank you so much!

alpha_pinene
 

Attachments

  • 1643324145841.png
    1643324145841.png
    7.3 KB · Views: 14

Excel Facts

Using Function Arguments with nested formulas
If writing INDEX in Func. Arguments, type MATCH(. Use the mouse to click inside MATCH in the formula bar. Dialog switches to MATCH.
Hi,

Rather than showing formulas that may not be working, which doesn't really tell us much, you should show at least a handful of samples of the Text Strings, and explain exactly what you want extracted from those.
Show samples preferably using XL2BB, or at least in table format, so helpers can copy them for testing.
Probably a good idea to show the Original Text string along with what you're been able to "shorten"
XL2BB instructions in my signature.
 
Upvote 0
Hi,

Rather than showing formulas that may not be working, which doesn't really tell us much, you should show at least a handful of samples of the Text Strings, and explain exactly what you want extracted from those.
Show samples preferably using XL2BB, or at least in table format, so helpers can copy them for testing.
Probably a good idea to show the Original Text string along with what you're been able to "shorten"
XL2BB instructions in my signature.
I am very thankful for your message and the advice to use XL2BB. I believe I have done it, but please let me know if it doesn't work correctly. I used 3 separate cases - one in which the desired value shows up as 0.4 mm, one as 0.4mm, and one as (0.4mm) to demonstrate the hitches I have encountered, and I left the date (today's date of 1/27/2022) in front of "Breslow thickness" because I have encountered the problem where my number function picks up dates instead of my desired value. Also, this text is entirely fabricated to emulate what some of the reports look like and does not contain any actual patient data.

Book1
ABC
1Full text examplesShortened text examplesWhat I would like to have extracted:
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4
Sheet1
 
Upvote 0
Thanks for uploading using XL2BB, I thought you wanted the mm included.

Based on you 3 samples:

Book3.xlsx
ABC
1Full text examplesResultw/o mm
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4 mm0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
Sheet964
Cell Formulas
RangeFormula
B2:B4B2=TRIM(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),",",REPT(" ",99),1),99))
C2:C4C2=LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),"mm,",REPT(" ",99),1),99)+0


EDIT: Added solution for without "mm", solution With "mm" result is Text and cannot be calculated, solution Without "mm" is converted to Real Number and Can be calculated.
 
Last edited:
Upvote 0
I cannot thank you enough for helping me with this - I think it is absolutely perfect as long as the phrasing "Breslow thickness" is used, but when I applied to my sheet, I discovered there are a few scenarios in which the report is written as "Breslow depth" or just "Breslow." I tried just changing the phrasing which did work, so probably is still workable. However, when I tried the edited formula, I only get a "#VALUE!" result, and I have made sure that I changed the reference cells appropriately.

I am attaching another XL2BB to see if it is helpful in figuring out how to extract just the number even when phrasing is different? I totally understand if this is not possible and again I am so grateful for the time you are spending helping me out.

Book3
ABC
1Text columnOutputDesired Result
2maximum tumor (Breslow) thickness in millimeters: 0.7 millimetersin millimeters: 0.7 millimeters0.7 mm
3Breslow of 2.1mm along bulbar conjunctivalong bulbar conjunctiva2.1 mm
40.75 mm in Breslow thickness, located on the backaed on0.75 mm
50.85 mm in Breslow depth located on the left footed on the left foot0.85 mm
Sheet1

Thanks for uploading using XL2BB, I thought you wanted the mm included.

Based on you 3 samples:

Book3.xlsx
ABC
1Full text examplesResultw/o mm
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4 mm0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
Sheet964
Cell Formulas
RangeFormula
B2:B4B2=TRIM(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),",",REPT(" ",99),1),99))
C2:C4C2=LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),"mm,",REPT(" ",99),1),99)+0


EDIT: Added solution for without "mm", solution With "mm" result is Text and cannot be calculated, solution Without "mm" is converted to Real Number and Can be calculated.I canno

Thanks for uploading using XL2BB, I thought you wanted the mm included.

Based on you 3 samples:

Book3.xlsx
ABC
1Full text examplesResultw/o mm
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4 mm0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4mm0.4
Sheet964
Cell Formulas
RangeFormula
B2:B4B2=TRIM(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),",",REPT(" ",99),1),99))
C2:C4C2=LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),"mm,",REPT(" ",99),1),99)+0


EDIT: Added solution for without "mm", solution With "mm" result is Text and cannot be calculated, solution Without "mm" is converted to Real Number and Can be calculated.
 
Upvote 0
Are your new samples above in Column A the Entire Text string?, and not partial?
 
Upvote 0
Assuming your answer to my question above is Yes, try this updated formula.

Please note, I'm not giving you a version that includes "mm", since that would result in Text, and you said you need to do calculations with the result, if you want to see "mm" in the result cell but keep the result as Real Numbers, you can Custom Format the cell like 0.00 "mm" as in my sample cell B8

Book3.xlsx
ABC
1Full text examplesw/o mm
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4
5maximum tumor (Breslow) thickness in millimeters: 0.7 millimeters0.7
6Breslow of 2.1mm along bulbar conjunctiva2.1
70.75 mm in Breslow thickness, located on the back0.75
80.85 mm in Breslow depth located on the left foot0.85 mm< Cell Custom formatted 0.00 "mm"
Sheet964
Cell Formulas
RangeFormula
B2:B8B2=IFERROR(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),"mm,",REPT(" ",99),1),99)+0,LEFT(SUBSTITUTE(MID(A2,MIN(SEARCH({0,1,2,3,4}+{0;5},A2&1/17)),LEN(A2)),"m",REPT(" ",99),1),99)+0)
 
Upvote 0
Assuming your answer to my question above is Yes, try this updated formula.

Please note, I'm not giving you a version that includes "mm", since that would result in Text, and you said you need to do calculations with the result, if you want to see "mm" in the result cell but keep the result as Real Numbers, you can Custom Format the cell like 0.00 "mm" as in my sample cell B8

Book3.xlsx
ABC
1Full text examplesw/o mm
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4
5maximum tumor (Breslow) thickness in millimeters: 0.7 millimeters0.7
6Breslow of 2.1mm along bulbar conjunctiva2.1
70.75 mm in Breslow thickness, located on the back0.75
80.85 mm in Breslow depth located on the left foot0.85 mm< Cell Custom formatted 0.00 "mm"
Sheet964
Cell Formulas
RangeFormula
B2:B8B2=IFERROR(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A2,SEARCH("Breslow thickness",A2)+18,LEN(A2)),"of",""),"(",""),")",""),"mm,",REPT(" ",99),1),99)+0,LEFT(SUBSTITUTE(MID(A2,MIN(SEARCH({0,1,2,3,4}+{0;5},A2&1/17)),LEN(A2)),"m",REPT(" ",99),1),99)+0)
You are a true genius and I'm so grateful for your time and assistance. This works perfectly. I will mark as solved and I wish you the best.
 
Upvote 0
Alternative formula:
Book1
AB
1Full text examples
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4
5maximum tumor (Breslow) thickness in millimeters: 0.7 millimeters0.7
6Breslow of 2.1mm along bulbar conjunctiva2.1
70.75 mm in Breslow thickness, located on the back0.75
8 4 mm in depth located on the left foot4
Sheet1
Cell Formulas
RangeFormula
B2:B8B2=AGGREGATE(14,6,MID(SUBSTITUTE(A2,"milli","mm"),IFERROR(IFERROR(IFERROR(IFERROR(IFERROR(SEARCH("?.?mm",SUBSTITUTE(A2,"milli","mm")),SEARCH("?.?mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?.??mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?.???mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("? mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?mm",SUBSTITUTE(A2,"milli","mm"))),{1,2,3,4})+0,1)
 
Upvote 0
Solution
You are a true genius and I'm so grateful for your time and assistance. This works perfectly. I will mark as solved and I wish you the best.

You're welcome, thanks for the feedback, my father was a cancer (not skin) victim.

And also to answer one of your questions in OP .....-LEN(111111111111) can just be -12
 
Upvote 0

Forum statistics

Threads
1,224,817
Messages
6,181,144
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top