Series of questions about pulling data from text (converting to yes/no, or present/absent, identified/not identified)

alpha_pinene

New Member
Joined
Jan 27, 2022
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi all!

I am so grateful for the help I received with my last question regarding pulling numbers from text, truly saved the day. I am interested in extracting more text-based data regarding tumor characteristics from pathology reports, and relevant metrics would include ulceration, regression, and anatomic level (something called Clark level). I am going to include an XL2BB file with some examples (the file does not contain any actual patient information or protected health information - examples generated to show the different ways the data is entered in these reports).

It might be a lofty goal, but I would like to find some way to pull yes/no for ulceration and regression, and the actual Clark level (often entered as a roman numeral). Ulceration and regression are coded in a frustratingly diverse number of ways, such as present, absent, identified, not identified, yes, no. If getting each of these words is the only workaround rather than converting to a simple yes/no column, that is fine and I can sort through them later (hopefully makes sense when seeing my XL2BB file).

Many thanks in advance if this is possible. I am in awe of the excel prowess and genius of the users on this forum.

Best,

alpha_pinene

Book4
ABCDE
1What text says:What I need (ulceration):Alternatively, and better, would be a yes/no for ulceration:Would also like to adapt formula to check for regression yes/noWould also like to adapt to collect Clark level (roman numeral or alphanumeric okay)
2Regression: not identified, Ulceration: not identified, Anatomic (Clark) level: INot identifiedNoNoI
3Regression: identified, Ulceration: present, Anatomic (Clark) level: IIIPresentYesYesIII
4Regression: none, Ulceration: absent, Anatomic (Clark) level: VAbsentNoNoV
5Regression; identified Ulceration; yes Anatomic (Clark) level: at least IYesYesYes≥I
6Regression; no Ulceration; no Anatomic (Clark) level: at least INoNoNo≥I
7Regression present, ulceration not identified, clark level cannot be determinedNot identifiedNoYesCannot be determined (or whatever placeholder for this)
Sheet1
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
I came up with the following code that works based on your sample data. It separately finds the ulceration, regression, and clark data that matches what you have in the right 3 columns. I began by putting the data in arrays, thinking that they would be used to put the data somewhere else, but I ended up just printing them out at the end of each For Loop. So, they're overkill now, but if you wanted to change anything later, they might be handy.
VBA Code:
Sub GetData()
    Dim c As Range
    Dim aArray As Variant
    Dim bArray As Variant
    Dim iCnt As Integer
    Dim i As Integer
    Dim s As String
    Dim ulceration() As String
    Dim regression() As String
    Dim clark() As String
    
    Dim uYes As Variant
    Dim rYes As Variant
    
    iCnt = -1
    uYes = Split("present,yes", ",")
    rYes = Split("present,identified", ",")
    
    For Each c In Range("A2", Cells(Rows.Count, 1).End(xlUp))
        s = ""
        iCnt = iCnt + 1
        ReDim Preserve ulceration(iCnt)
        ReDim Preserve regression(iCnt)
        ReDim Preserve clark(iCnt)
        
        'Get Clark level
        aArray = Split(LCase(c.Value), "clark")
        If InStr(1, aArray(1), "at least") > 0 Then
            s = ChrW(&H2265)
            bArray = Split(UCase(aArray(1)), " ")
            s = s & bArray(UBound(bArray))
        ElseIf InStr(1, aArray(1), "cannot") > 0 Then
            s = "Cannot be determined"
        Else
            bArray = Split(UCase(aArray(1)), " ")
            s = bArray(UBound(bArray))
        End If
        clark(iCnt) = s
        
        'Get ulceration
        aArray = Split(LCase(aArray(0)), "ulceration")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), uYes(i)) > 0 Then
                s = "Yes"
                Exit For
            End If
        Next
        ulceration(iCnt) = s
        
        'Get regression
        aArray = Split(LCase(aArray(0)), "regression")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), rYes(i)) > 0 And InStr(1, aArray(1), "not") < 1 Then
                s = "Yes"
                Exit For
            End If
        Next
        regression(iCnt) = s
        
        c.Offset(0, 1).Value = ulceration(iCnt)
        c.Offset(0, 2).Value = regression(iCnt)
        c.Offset(0, 3).Value = clark(iCnt)
    Next
End Sub
 
Upvote 0
Thank you so much for your hard work on this - I keep getting an error that says "Cant execute code in break mode." I am very new to VBA, any advice on how to work around this?
I came up with the following code that works based on your sample data. It separately finds the ulceration, regression, and clark data that matches what you have in the right 3 columns. I began by putting the data in arrays, thinking that they would be used to put the data somewhere else, but I ended up just printing them out at the end of each For Loop. So, they're overkill now, but if you wanted to change anything later, they might be handy.
VBA Code:
Sub GetData()
    Dim c As Range
    Dim aArray As Variant
    Dim bArray As Variant
    Dim iCnt As Integer
    Dim i As Integer
    Dim s As String
    Dim ulceration() As String
    Dim regression() As String
    Dim clark() As String
   
    Dim uYes As Variant
    Dim rYes As Variant
   
    iCnt = -1
    uYes = Split("present,yes", ",")
    rYes = Split("present,identified", ",")
   
    For Each c In Range("A2", Cells(Rows.Count, 1).End(xlUp))
        s = ""
        iCnt = iCnt + 1
        ReDim Preserve ulceration(iCnt)
        ReDim Preserve regression(iCnt)
        ReDim Preserve clark(iCnt)
       
        'Get Clark level
        aArray = Split(LCase(c.Value), "clark")
        If InStr(1, aArray(1), "at least") > 0 Then
            s = ChrW(&H2265)
            bArray = Split(UCase(aArray(1)), " ")
            s = s & bArray(UBound(bArray))
        ElseIf InStr(1, aArray(1), "cannot") > 0 Then
            s = "Cannot be determined"
        Else
            bArray = Split(UCase(aArray(1)), " ")
            s = bArray(UBound(bArray))
        End If
        clark(iCnt) = s
       
        'Get ulceration
        aArray = Split(LCase(aArray(0)), "ulceration")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), uYes(i)) > 0 Then
                s = "Yes"
                Exit For
            End If
        Next
        ulceration(iCnt) = s
       
        'Get regression
        aArray = Split(LCase(aArray(0)), "regression")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), rYes(i)) > 0 And InStr(1, aArray(1), "not") < 1 Then
                s = "Yes"
                Exit For
            End If
        Next
        regression(iCnt) = s
       
        c.Offset(0, 1).Value = ulceration(iCnt)
        c.Offset(0, 2).Value = regression(iCnt)
        c.Offset(0, 3).Value = clark(iCnt)
    Next
End Sub
 
Upvote 0
I came up with the following code that works based on your sample data. It separately finds the ulceration, regression, and clark data that matches what you have in the right 3 columns. I began by putting the data in arrays, thinking that they would be used to put the data somewhere else, but I ended up just printing them out at the end of each For Loop. So, they're overkill now, but if you wanted to change anything later, they might be handy.
VBA Code:
Sub GetData()
    Dim c As Range
    Dim aArray As Variant
    Dim bArray As Variant
    Dim iCnt As Integer
    Dim i As Integer
    Dim s As String
    Dim ulceration() As String
    Dim regression() As String
    Dim clark() As String
   
    Dim uYes As Variant
    Dim rYes As Variant
   
    iCnt = -1
    uYes = Split("present,yes", ",")
    rYes = Split("present,identified", ",")
   
    For Each c In Range("A2", Cells(Rows.Count, 1).End(xlUp))
        s = ""
        iCnt = iCnt + 1
        ReDim Preserve ulceration(iCnt)
        ReDim Preserve regression(iCnt)
        ReDim Preserve clark(iCnt)
       
        'Get Clark level
        aArray = Split(LCase(c.Value), "clark")
        If InStr(1, aArray(1), "at least") > 0 Then
            s = ChrW(&H2265)
            bArray = Split(UCase(aArray(1)), " ")
            s = s & bArray(UBound(bArray))
        ElseIf InStr(1, aArray(1), "cannot") > 0 Then
            s = "Cannot be determined"
        Else
            bArray = Split(UCase(aArray(1)), " ")
            s = bArray(UBound(bArray))
        End If
        clark(iCnt) = s
       
        'Get ulceration
        aArray = Split(LCase(aArray(0)), "ulceration")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), uYes(i)) > 0 Then
                s = "Yes"
                Exit For
            End If
        Next
        ulceration(iCnt) = s
       
        'Get regression
        aArray = Split(LCase(aArray(0)), "regression")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), rYes(i)) > 0 And InStr(1, aArray(1), "not") < 1 Then
                s = "Yes"
                Exit For
            End If
        Next
        regression(iCnt) = s
       
        c.Offset(0, 1).Value = ulceration(iCnt)
        c.Offset(0, 2).Value = regression(iCnt)
        c.Offset(0, 3).Value = clark(iCnt)
    Next
End Sub
Actually I think I solved the break mode error, but now I'm seeing run-time error '13' type mismatch, and when I click debug, this is highlighted:

1644068359570.png
 
Upvote 0
I came up with the following code that works based on your sample data. It separately finds the ulceration, regression, and clark data that matches what you have in the right 3 columns. I began by putting the data in arrays, thinking that they would be used to put the data somewhere else, but I ended up just printing them out at the end of each For Loop. So, they're overkill now, but if you wanted to change anything later, they might be handy.
VBA Code:
Sub GetData()
    Dim c As Range
    Dim aArray As Variant
    Dim bArray As Variant
    Dim iCnt As Integer
    Dim i As Integer
    Dim s As String
    Dim ulceration() As String
    Dim regression() As String
    Dim clark() As String
   
    Dim uYes As Variant
    Dim rYes As Variant
   
    iCnt = -1
    uYes = Split("present,yes", ",")
    rYes = Split("present,identified", ",")
   
    For Each c In Range("A2", Cells(Rows.Count, 1).End(xlUp))
        s = ""
        iCnt = iCnt + 1
        ReDim Preserve ulceration(iCnt)
        ReDim Preserve regression(iCnt)
        ReDim Preserve clark(iCnt)
       
        'Get Clark level
        aArray = Split(LCase(c.Value), "clark")
        If InStr(1, aArray(1), "at least") > 0 Then
            s = ChrW(&H2265)
            bArray = Split(UCase(aArray(1)), " ")
            s = s & bArray(UBound(bArray))
        ElseIf InStr(1, aArray(1), "cannot") > 0 Then
            s = "Cannot be determined"
        Else
            bArray = Split(UCase(aArray(1)), " ")
            s = bArray(UBound(bArray))
        End If
        clark(iCnt) = s
       
        'Get ulceration
        aArray = Split(LCase(aArray(0)), "ulceration")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), uYes(i)) > 0 Then
                s = "Yes"
                Exit For
            End If
        Next
        ulceration(iCnt) = s
       
        'Get regression
        aArray = Split(LCase(aArray(0)), "regression")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), rYes(i)) > 0 And InStr(1, aArray(1), "not") < 1 Then
                s = "Yes"
                Exit For
            End If
        Next
        regression(iCnt) = s
       
        c.Offset(0, 1).Value = ulceration(iCnt)
        c.Offset(0, 2).Value = regression(iCnt)
        c.Offset(0, 3).Value = clark(iCnt)
    Next
End Sub
I came up with the following code that works based on your sample data. It separately finds the ulceration, regression, and clark data that matches what you have in the right 3 columns. I began by putting the data in arrays, thinking that they would be used to put the data somewhere else, but I ended up just printing them out at the end of each For Loop. So, they're overkill now, but if you wanted to change anything later, they might be handy.
VBA Code:
Sub GetData()
    Dim c As Range
    Dim aArray As Variant
    Dim bArray As Variant
    Dim iCnt As Integer
    Dim i As Integer
    Dim s As String
    Dim ulceration() As String
    Dim regression() As String
    Dim clark() As String
   
    Dim uYes As Variant
    Dim rYes As Variant
   
    iCnt = -1
    uYes = Split("present,yes", ",")
    rYes = Split("present,identified", ",")
   
    For Each c In Range("A2", Cells(Rows.Count, 1).End(xlUp))
        s = ""
        iCnt = iCnt + 1
        ReDim Preserve ulceration(iCnt)
        ReDim Preserve regression(iCnt)
        ReDim Preserve clark(iCnt)
       
        'Get Clark level
        aArray = Split(LCase(c.Value), "clark")
        If InStr(1, aArray(1), "at least") > 0 Then
            s = ChrW(&H2265)
            bArray = Split(UCase(aArray(1)), " ")
            s = s & bArray(UBound(bArray))
        ElseIf InStr(1, aArray(1), "cannot") > 0 Then
            s = "Cannot be determined"
        Else
            bArray = Split(UCase(aArray(1)), " ")
            s = bArray(UBound(bArray))
        End If
        clark(iCnt) = s
       
        'Get ulceration
        aArray = Split(LCase(aArray(0)), "ulceration")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), uYes(i)) > 0 Then
                s = "Yes"
                Exit For
            End If
        Next
        ulceration(iCnt) = s
       
        'Get regression
        aArray = Split(LCase(aArray(0)), "regression")
        s = "No"
        For i = 0 To UBound(uYes)
            If InStr(1, aArray(1), rYes(i)) > 0 And InStr(1, aArray(1), "not") < 1 Then
                s = "Yes"
                Exit For
            End If
        Next
        regression(iCnt) = s
       
        c.Offset(0, 1).Value = ulceration(iCnt)
        c.Offset(0, 2).Value = regression(iCnt)
        c.Offset(0, 3).Value = clark(iCnt)
    Next
End Sub
Apologies for my string of messages. It seems to be working perfectly with ulceration and regression, but Clark level keep causing it to freeze (and the results are odd, like "R," "RET," and "AN." I think I could live without Clark level as a variable, if I could just get ulceration and regression. I tried to just delete all the terms in the code related to clark but then the code no longer works. Again, sorry for messaging 3 times in a row, but if there is a way to modify to no longer search for clark, I would be so grateful!
 
Upvote 0
If you are using new data that isn't the sample you provided above (which should be the intent of this, right?), can you provide it so I can test? XbBB would be great.
 
Upvote 0

Forum statistics

Threads
1,224,828
Messages
6,181,209
Members
453,023
Latest member
alabaz

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top