text extract macro

zroadrider

New Member
Joined
Aug 12, 2011
Messages
14
I have a string like

[hyd][Aes][mCes][Ges][Te][hyd]...

that I am trying to remove the text from the brackets and have just the text in a row of cells so A1=hyd, A2=Aes, A3=mCes and so on.

Right now I use the text to column function with [ as the delimiter followed by a few other search and logical functions to obtain the desired results , but there should be a simple VBA macro to accomplish this.

thanks
 
It did not work as I expected, but it led me to use these
C1=mCes or Te
C5=len(A1)
C6=IF(MID(C1,C5,1)="s",LEFT(C1,(C5-1)),RIGHT(C1,C5))

returns mCe or Te which is great.

Thanks
 
Upvote 0

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
Select A1:A2 (cells to be parsed) and apply the macro:
Excel Workbook
ABCDEFGH
1[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]AeCeGeTeAdmCeUk
2[Aes][Ces][Ges][Tes][Ad][mCes][Uk]AeCeGeTeAdmCeUk
Sheet


The code:
Rich (BB code):

' Select columns with cells to be parsed and apply macro
Sub ParseString()
  Dim a, c As Long, i As Long, ub As Long, wd As String, x
  For Each x In Intersect(Selection.Columns(1), Selection.Parent.UsedRange)
    ' Process only string cells in selection
    If VarType(x) = vbString Then
      ' Copy value to the byte array
      a = Split(Replace(Mid$(Trim$(x), 2), "]", ""), "[")
      ' Calc size of the zero-based a() assay
      ub = UBound(a)
      ' Compare the first word with the last one
      If StrComp(a(0), a(ub)) = 0 Then
        ' If the first word is the same as the last one then delete both
        a(0) = x
        a(ub) = ""
        c = 0 ' <-- destination column offset
      Else
        c = 1 ' <-- destination column offset
        ub = ub + 1
        ReDim Preserve a(ub)
      End If
      ' Delete right "s" char in each word
      For i = 0 To ub
        wd = a(i)
        If Right$(wd, 1) = "s" Then a(i) = Left$(wd, Len(wd) - 1)
      Next
      ' Write the result
      x.Offset(, c).Resize(, ub + 1).Value = a
    End If
  Next
End Sub
 
Last edited:
Upvote 0
See whether this does it:
Code:
Sub blah()
If Selection.Columns.Count > 1 Then Exit Sub
For Each cll In Selection.Cells
    xx = cll.Value
    If Left(xx, 1) = "[" Then xx = Mid(xx, 2, Len(xx))
    If Right(xx, 1) = "]" Then xx = Left(xx, Len(xx) - 1)
    zz = Split(xx, "][")
    ReDim zzz(1 To UBound(zz) - 1)
    For i = 1 To UBound(zzz)
        zzz(i) = zz(i)
        If Right(zzz(i), 1) = "s" Then zzz(i) = Left(zzz(i), Len(zzz(i)) - 1)
    Next i
    cll.Offset(, 1).Resize(, UBound(zzz)) = zzz
    'cll.Resize(, UBound(zzz)) = zzz ' this line overwrites original cell.
Next cll
End Sub
 
Last edited:
Upvote 0
The modified version.
Select the column to be parsed ("A:A") and apply the macro:
Excel Workbook
ABCDEFGHI
1ImputOutput
2[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]HydrogenAeCeGeTeAdmCeUk
3[Aes][Ces][Ges][Tes][Ad][mCes][Uk]AeCeGeTeAdmCeUk
Sheet


The code:
Rich (BB code):

' Select columns with cells to be parsed and apply macro
Sub ParseString()
  Dim a, i As Long, ub As Long, wd As String, x
  For Each x In Intersect(Selection.Columns(1), Selection.Parent.UsedRange)
    ' Process only string cells in selection
    If VarType(x) = vbString Then
      ' Parse only text with leftmost "[" char
      If Left(x, 1) = "[" Then
        ' Copy value to the byte array
        a = Split(Replace(Mid$(Trim$(x), 2), "]", ""), "[")
        ' Calc size ob zero-based a() assay
        ub = UBound(a)
        ' Compare the first word with the last one
        If StrComp(a(0), a(ub)) = 0 Then
          ' Delete the last word if it's the same as the 1st on
          a(ub) = ""
        Else
          ub = ub + 1
          ReDim Preserve a(ub)
        End If
        ' Delete right "s" char in each word
        For i = 0 To ub
          wd = a(i)
          If Right$(wd, 1) = "s" Then a(i) = Left$(wd, Len(wd) - 1)
        Next
        ' Write the result
        x.Offset(, 1).Resize(, ub + 1).Value = a
      End If
    End If
  Next
End Sub
 
Upvote 0
does the job! thanks
I'm wondering which one "does the job". Your description of what you wanted was not entirely clear. In your first message, you repeated the [hyd] at both ends and showed that you wanted it returned to you. In a later message, you repeated [Hydrogen] at both ends and showed you wanted it excluded. What determine this... the length of the repeated item or was the first message a typo? Also, is the first item always repeated at the end of the text, or could you have situations where it is not and you wanted all items returned? Just for my own curiosity, could you clarify which of these data structures you actually have?
 
Last edited:
Upvote 0
<TABLE style="PADDING-RIGHT: 2pt; PADDING-LEFT: 2pt; FONT-SIZE: 10pt; FONT-FAMILY: Arial,Arial; BACKGROUND-COLOR: #ffffff" cellSpacing=0 cellPadding=0 border=1><COLGROUP><COL style="WIDTH: 30px"><COL style="WIDTH: 401px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"></COLGROUP><TBODY><TR style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center"><TD> </TD><TD>A</TD><TD>B</TD><TD>C</TD><TD>D</TD><TD>E</TD><TD>F</TD><TD>G</TD><TD>H</TD><TD>I</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">1</TD><TD style="FONT-WEIGHT: bold; BACKGROUND-COLOR: #ccffff; TEXT-ALIGN: center">Imput</TD><TD style="FONT-WEIGHT: bold; BACKGROUND-COLOR: #ccffff; TEXT-ALIGN: center" colSpan=8>Output</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">2</TD><TD>[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]</TD><TD>Hydrogen</TD><TD>Ae</TD><TD>Ce</TD><TD>Ge</TD><TD>Te</TD><TD>Ad</TD><TD>mCe</TD><TD>Uk</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">3</TD><TD>[Aes][Ces][Ges][Tes][Ad][mCes][Uk]</TD><TD>Ae</TD><TD>Ce</TD><TD>Ge</TD><TD>Te</TD><TD>Ad</TD><TD>mCe</TD><TD>Uk</TD><TD> </TD></TR></TBODY></TABLE>


They both work. I dony actually care about the Hydrogen as long as all of the text in the [] were seperated into individual cells.
This way I can count how many of each base unit Ae..Ce.. are in all of the strings.
Each base unit is then renamed (such as Ae=A, mCe=c).... and recombined to give single letter string like ACGTacu for the above sequence. I then export this as a text file which can be read by a DNA synthesizer.
 
Upvote 0
They both work. I dony actually care about the Hydrogen as long as all of the text in the [] were seperated into individual cells.

Okay, here is another macro you can consider for use then...

Code:
Sub SplitSymbols()
  Dim X As Long, LastRow As Long, CellText As String, Elements() As String
  Const StartRow As Long = 1
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  For X = StartRow To LastRow
    CellText = Cells(X, "A").Value
    Elements = Split(Mid(Replace(CellText, "s]", "]"), 2, Len(CellText) - 2), "][")
    Cells(X, "B").Resize(, UBound(Elements)).Value = Elements
  Next
End Sub

This way I can count how many of each base unit Ae..Ce.. are in all of the strings.
Each base unit is then renamed (such as Ae=A, mCe=c).... and recombined to give single letter string like ACGTacu for the above sequence. I then export this as a text file which can be read by a DNA synthesizer.
If you tell us all the possible components (Ae, mCe, etc.) and their single letter abbreivations, then we can modify our codes to automatically exclude things like [Hydrogen] and [hyd] from the outputted cell values. Also, if you tell us exactly how your final letter string (ACGTacu) looks when there are repeated components, we can build that output string for you in code... and even output it to a file for you (if you tell us whether a new file is created each time or an old file appended to instead).
 
Upvote 0
Well I didn't want to ask for too much. I have over 60 different components with new ones being added and each one may be assigned a different single letter. So far

Sequence Filename
1 RG426088-3, 57412142186814124385 c:\RG426088-3
 
Upvote 0
Woops, took to long to edit
I have over 60 different components with new ones being added and each one may be assigned a different single letter. So far the excel program will extract the bases from the string using the above macro, count each base, assign a designated single letter to each base, and write the new single letter sequence. For the last step I am trying to export the sequence as a text file. Ideally I would be able to select the desired cells (such as A1:A2) and have them exported with the filename (B1:B2) in the ajacent cell.

A B

Sequence Filename
1 ACGTacgt c:\1234.txt
2 acgtACGTu c:\abcd.txt
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,149
Messages
6,183,193
Members
453,151
Latest member
Lizamaison

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top