text extract macro

zroadrider · Aug 12, 2011

I have a string like

[hyd][Aes][mCes][Ges][Te][hyd]...

that I am trying to remove the text from the brackets and have just the text in a row of cells so A1=hyd, A2=Aes, A3=mCes and so on.

Right now I use the text to column function with [ as the delimiter followed by a few other search and logical functions to obtain the desired results , but there should be a simple VBA macro to accomplish this.

thanks

zroadrider · Aug 13, 2011

It did not work as I expected, but it led me to use these
C1=mCes or Te
C5=len(A1)
C6=IF(MID(C1,C5,1)="s",LEFT(C1,(C5-1)),RIGHT(C1,C5))

returns mCe or Te which is great.

Thanks

ZVI · Aug 13, 2011

Select A1:A2 (cells to be parsed) and apply the macro:

Excel Workbook

A

B

C

D

E

F

G

H

1

[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]

Ae

Ce

Ge

Te

Ad

mCe

Uk

2

[Aes][Ces][Ges][Tes][Ad][mCes][Uk]

Ae

Ce

Ge

Te

Ad

mCe

Uk

Sheet

The code:

Rich (BB code):


' Select columns with cells to be parsed and apply macro
Sub ParseString()
  Dim a, c As Long, i As Long, ub As Long, wd As String, x
  For Each x In Intersect(Selection.Columns(1), Selection.Parent.UsedRange)
    ' Process only string cells in selection
    If VarType(x) = vbString Then
      ' Copy value to the byte array
      a = Split(Replace(Mid$(Trim$(x), 2), "]", ""), "[")
      ' Calc size of the zero-based a() assay
      ub = UBound(a)
      ' Compare the first word with the last one
      If StrComp(a(0), a(ub)) = 0 Then
        ' If the first word is the same as the last one then delete both
        a(0) = x
        a(ub) = ""
        c = 0 ' <-- destination column offset
      Else
        c = 1 ' <-- destination column offset
        ub = ub + 1
        ReDim Preserve a(ub)
      End If
      ' Delete right "s" char in each word
      For i = 0 To ub
        wd = a(i)
        If Right$(wd, 1) = "s" Then a(i) = Left$(wd, Len(wd) - 1)
      Next
      ' Write the result
      x.Offset(, c).Resize(, ub + 1).Value = a
    End If
  Next
End Sub

p45cal · Aug 13, 2011

See whether this does it:

Code:

Sub blah()
If Selection.Columns.Count > 1 Then Exit Sub
For Each cll In Selection.Cells
    xx = cll.Value
    If Left(xx, 1) = "[" Then xx = Mid(xx, 2, Len(xx))
    If Right(xx, 1) = "]" Then xx = Left(xx, Len(xx) - 1)
    zz = Split(xx, "][")
    ReDim zzz(1 To UBound(zz) - 1)
    For i = 1 To UBound(zzz)
        zzz(i) = zz(i)
        If Right(zzz(i), 1) = "s" Then zzz(i) = Left(zzz(i), Len(zzz(i)) - 1)
    Next i
    cll.Offset(, 1).Resize(, UBound(zzz)) = zzz
    'cll.Resize(, UBound(zzz)) = zzz ' this line overwrites original cell.
Next cll
End Sub

ZVI · Aug 13, 2011

The modified version.
Select the column to be parsed ("A:A") and apply the macro:

Excel Workbook

A

B

C

D

E

F

G

H

I

1

Imput

Output

2

[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]

Hydrogen

Ae

Ce

Ge

Te

Ad

mCe

Uk

3

[Aes][Ces][Ges][Tes][Ad][mCes][Uk]

Ae

Ce

Ge

Te

Ad

mCe

Uk

Sheet

The code:

Rich (BB code):


' Select columns with cells to be parsed and apply macro
Sub ParseString()
  Dim a, i As Long, ub As Long, wd As String, x
  For Each x In Intersect(Selection.Columns(1), Selection.Parent.UsedRange)
    ' Process only string cells in selection
    If VarType(x) = vbString Then
      ' Parse only text with leftmost "[" char
      If Left(x, 1) = "[" Then
        ' Copy value to the byte array
        a = Split(Replace(Mid$(Trim$(x), 2), "]", ""), "[")
        ' Calc size ob zero-based a() assay
        ub = UBound(a)
        ' Compare the first word with the last one
        If StrComp(a(0), a(ub)) = 0 Then
          ' Delete the last word if it's the same as the 1st on
          a(ub) = ""
        Else
          ub = ub + 1
          ReDim Preserve a(ub)
        End If
        ' Delete right "s" char in each word
        For i = 0 To ub
          wd = a(i)
          If Right$(wd, 1) = "s" Then a(i) = Left$(wd, Len(wd) - 1)
        Next
        ' Write the result
        x.Offset(, 1).Resize(, ub + 1).Value = a
      End If
    End If
  Next
End Sub

zroadrider · Aug 16, 2011

does the job! thanks

Rick Rothstein · Aug 16, 2011

zroadrider said:
does the job! thanks

I'm wondering which one "does the job". Your description of what you wanted was not entirely clear. In your first message, you repeated the [hyd] at both ends and showed that you wanted it returned to you. In a later message, you repeated [Hydrogen] at both ends and showed you wanted it excluded. What determine this... the length of the repeated item or was the first message a typo? Also, is the first item always repeated at the end of the text, or could you have situations where it is not and you wanted all items returned? Just for my own curiosity, could you clarify which of these data structures you actually have?

zroadrider · Aug 16, 2011

<TABLE style="PADDING-RIGHT: 2pt; PADDING-LEFT: 2pt; FONT-SIZE: 10pt; FONT-FAMILY: Arial,Arial; BACKGROUND-COLOR: #ffffff" cellSpacing=0 cellPadding=0 border=1><COLGROUP><COL style="WIDTH: 30px"><COL style="WIDTH: 401px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"><COL style="WIDTH: 64px"></COLGROUP><TBODY><TR style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center"><TD> </TD><TD>A</TD><TD>B</TD><TD>C</TD><TD>D</TD><TD>E</TD><TD>F</TD><TD>G</TD><TD>H</TD><TD>I</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">1</TD><TD style="FONT-WEIGHT: bold; BACKGROUND-COLOR: #ccffff; TEXT-ALIGN: center">Imput</TD><TD style="FONT-WEIGHT: bold; BACKGROUND-COLOR: #ccffff; TEXT-ALIGN: center" colSpan=8>Output</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">2</TD><TD>[Hydrogen][Aes][Ces][Ges][Tes][Ad][mCes][Uk][Hydrogen]</TD><TD>Hydrogen</TD><TD>Ae</TD><TD>Ce</TD><TD>Ge</TD><TD>Te</TD><TD>Ad</TD><TD>mCe</TD><TD>Uk</TD></TR><TR style="HEIGHT: 17px"><TD style="FONT-SIZE: 8pt; BACKGROUND-COLOR: #cacaca; TEXT-ALIGN: center">3</TD><TD>[Aes][Ces][Ges][Tes][Ad][mCes][Uk]</TD><TD>Ae</TD><TD>Ce</TD><TD>Ge</TD><TD>Te</TD><TD>Ad</TD><TD>mCe</TD><TD>Uk</TD><TD> </TD></TR></TBODY></TABLE>

They both work. I dony actually care about the Hydrogen as long as all of the text in the [] were seperated into individual cells.
This way I can count how many of each base unit Ae..Ce.. are in all of the strings.
Each base unit is then renamed (such as Ae=A, mCe=c).... and recombined to give single letter string like ACGTacu for the above sequence. I then export this as a text file which can be read by a DNA synthesizer.

Rick Rothstein · Aug 16, 2011

zroadrider said:
They both work. I dony actually care about the Hydrogen as long as all of the text in the [] were seperated into individual cells.

Okay, here is another macro you can consider for use then...

Code:

Sub SplitSymbols()
  Dim X As Long, LastRow As Long, CellText As String, Elements() As String
  Const StartRow As Long = 1
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  For X = StartRow To LastRow
    CellText = Cells(X, "A").Value
    Elements = Split(Mid(Replace(CellText, "s]", "]"), 2, Len(CellText) - 2), "][")
    Cells(X, "B").Resize(, UBound(Elements)).Value = Elements
  Next
End Sub

zroadrider said:
This way I can count how many of each base unit Ae..Ce.. are in all of the strings.
Each base unit is then renamed (such as Ae=A, mCe=c).... and recombined to give single letter string like ACGTacu for the above sequence. I then export this as a text file which can be read by a DNA synthesizer.

If you tell us all the possible components (Ae, mCe, etc.) and their single letter abbreivations, then we can modify our codes to automatically exclude things like [Hydrogen] and [hyd] from the outputted cell values. Also, if you tell us exactly how your final letter string (ACGTacu) looks when there are repeated components, we can build that output string for you in code... and even output it to a file for you (if you tell us whether a new file is created each time or an old file appended to instead).

zroadrider · Aug 16, 2011

Well I didn't want to ask for too much. I have over 60 different components with new ones being added and each one may be assigned a different single letter. So far

Sequence Filename
1 RG426088-3, 57412142186814124385 c:\RG426088-3

zroadrider · Aug 16, 2011

Woops, took to long to edit
I have over 60 different components with new ones being added and each one may be assigned a different single letter. So far the excel program will extract the bases from the string using the above macro, count each base, assign a designated single letter to each base, and write the new single letter sequence. For the last step I am trying to export the sequence as a text file. Ideally I would be able to select the desired cells (such as A1:A2) and have them exported with the filename (B1:B2) in the ajacent cell.

A B

Sequence Filename
1 ACGTacgt c:\1234.txt
2 acgtACGTu c:\abcd.txt

text extract macro

zroadrider

New Member

zroadrider

New Member

ZVI

MrExcel MVP

p45cal

Well-known Member

ZVI

MrExcel MVP

zroadrider

New Member

Rick Rothstein

MrExcel MVP

zroadrider

New Member

Rick Rothstein

MrExcel MVP

zroadrider

New Member

zroadrider

New Member

Similar threads

Share this page

text extract macro

New Member

New Member

MrExcel MVP

Well-known Member

MrExcel MVP

New Member

MrExcel MVP

New Member

MrExcel MVP

New Member

New Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock