Extracting numbers from text string and including units (mm)

alpha_pinene

New Member
Joined
Jan 27, 2022
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi all!
I am working on a project in which I am pulling data from pathology reports regarding skin cancer. The text reports are quite lengthy and contain many numbers (i.e., dates, cancer stage, and the number I am interested in, which is called Breslow thickness). I have managed to use formulas to condense the text into a small segment surrounding my word of interest ("Breslow") with 12 characters before the word Breslow and 75 characters after the word. ($S$1 corresponds to a cell with the word "Breslow.")
1643324166113.png


The string of 1's is because I could only figure out how to get characters to the LEFT of "Breslow" to show up by adding more numbers in that set of parentheses (which there is probably a better way and I would love ot hear it if so).

Then, I was able to get the number to the front of another column using the below formula:

1643324305836.png


But herein lies my problem. Sometimes, my number of interest (which typically ranges from 0.1-4 mm) isn't the first number in the string of text, and I will capture a date or a time (like 9/17 or 4:00:00AM), which is problematic, because I need to average all of the Breslow thicknesses and I'm getting a significant number of cells that don't capture the right number.

I thought if I could develop a formula that would search for a number PLUS the units (like 0.4mm or 0.4 mm, though there are variations in which cells do and don't have spaces which could complicate things), then I could be able to avoid getting the dates/times.

I am at my wit's end and would really appreciate any advice that can be offered.

Thank you so much!

alpha_pinene
 

Attachments

  • 1643324145841.png
    1643324145841.png
    7.3 KB · Views: 14
Alternative formula:
Book1
AB
1Full text examples
2Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4 mm, ulceration not identified, mitotic figures <1/mm2, regression absent. 0.4
3Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness of 0.4mm, ulceration present, mitotic figures 12/mm2, regression absent. 0.4
4Pathology report: Date: 01/27/2022, Provider: x physician, biopsy type: shave, body location: right upper extremity, diagnosis: melanoma. 1/27/2022 Breslow thickness: (0.4mm), ulceration present, mitotic figures 12/mm2, regression absent. 0.4
5maximum tumor (Breslow) thickness in millimeters: 0.7 millimeters0.7
6Breslow of 2.1mm along bulbar conjunctiva2.1
70.75 mm in Breslow thickness, located on the back0.75
8 4 mm in depth located on the left foot4
Sheet1
Cell Formulas
RangeFormula
B2:B8B2=AGGREGATE(14,6,MID(SUBSTITUTE(A2,"milli","mm"),IFERROR(IFERROR(IFERROR(IFERROR(IFERROR(SEARCH("?.?mm",SUBSTITUTE(A2,"milli","mm")),SEARCH("?.?mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?.??mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?.???mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("? mm",SUBSTITUTE(A2,"milli","mm"))),SEARCH("?mm",SUBSTITUTE(A2,"milli","mm"))),{1,2,3,4})+0,1)
After going through all of my data, it actually seems that the previous solution ended up missing a few values (could not figure out why) - however, this solution caught every single number that I needed. I am so grateful to both of the posters for helping me solve this, you've saved me tons of time! I am indebted.
 
Upvote 0

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.

Forum statistics

Threads
1,224,820
Messages
6,181,160
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top