Strap in, this is a weird one. I'm making a calculator of sorts and am having trouble with one of the formulas. Specifically the 'reverse sequence output' in C2.
To get you on board with what I'm trying to do, I'll explain a little bit. I want to be able to post a a sequence in A2 in that type of text format. In B2 I need to make some substitutions to make time calculations down the line easier, but I have all that figured out and they aren't included below. I need to make those substitutions because for example 'A' takes the same time to synth as 'G', but 'rA' takes a different time than 'A' and 'G'. The presence of an asterisk next to one of those means it will take even more time.
To complicate things the most, the sequences are made from right to left and four sequences can go at the same time, so I need to match up the right most DNA or rna for each sequence, moving leftward. Everything I have pasted below is an attempt to make some substitutions, and then turn the sequence around and delimit it. Everything is working perfectly, except for some reason, the 'reverse sequence output' formula breaks down if you extent the sequences to an arbitrary amount it seems, generlly higher like in the 70s to 100s. If you'd like to see how, copy and paste the input in A2 to multiply it about ten times.
There is something wrong with =TEXTJOIN(",",,TRIM(MID(SUBSTITUTE($B2,",",REPT(" ",LEN(B2))),SEQUENCE(,LEN(B2)-LEN(SUBSTITUTE(B2,",","")),LEN(B2)*(LEN(B2)-LEN(SUBSTITUTE(B2,",",""))),-LEN(B2)),LEN(B2)))) that makes it struggle with longer sequences. For the record, the longest I will need to go will be 120 rna and/or DNAs in a row.
You'll notice there is a difference between B2 and B3:B5. I was experimenting with different substitution styles. I do not care if it works with or without commas. Also note that the formulas in column D are extended out 119 more columns to accommodate all the delimits, they could not be fit here.
Any help is appreciated, or if you see an entirely better way to do what I'm doing please let me know.
To get you on board with what I'm trying to do, I'll explain a little bit. I want to be able to post a a sequence in A2 in that type of text format. In B2 I need to make some substitutions to make time calculations down the line easier, but I have all that figured out and they aren't included below. I need to make those substitutions because for example 'A' takes the same time to synth as 'G', but 'rA' takes a different time than 'A' and 'G'. The presence of an asterisk next to one of those means it will take even more time.
To complicate things the most, the sequences are made from right to left and four sequences can go at the same time, so I need to match up the right most DNA or rna for each sequence, moving leftward. Everything I have pasted below is an attempt to make some substitutions, and then turn the sequence around and delimit it. Everything is working perfectly, except for some reason, the 'reverse sequence output' formula breaks down if you extent the sequences to an arbitrary amount it seems, generlly higher like in the 70s to 100s. If you'd like to see how, copy and paste the input in A2 to multiply it about ten times.
There is something wrong with =TEXTJOIN(",",,TRIM(MID(SUBSTITUTE($B2,",",REPT(" ",LEN(B2))),SEQUENCE(,LEN(B2)-LEN(SUBSTITUTE(B2,",","")),LEN(B2)*(LEN(B2)-LEN(SUBSTITUTE(B2,",",""))),-LEN(B2)),LEN(B2)))) that makes it struggle with longer sequences. For the record, the longest I will need to go will be 120 rna and/or DNAs in a row.
You'll notice there is a difference between B2 and B3:B5. I was experimenting with different substitution styles. I do not care if it works with or without commas. Also note that the formulas in column D are extended out 119 more columns to accommodate all the delimits, they could not be fit here.
Any help is appreciated, or if you see an entirely better way to do what I'm doing please let me know.
Accurate Synthesis Calculator WIP.xlsx | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | |||
1 | Step 1 - Input Sequences | Sequence Output | Reversed sequence output | Reverse Delimited Sequence | |||||||||||||
2 | rA*rGrCAG*C*TrACrC | ,rna*,rna,rna,DNA,DNA*,DNA*,DNA,rna,DNA,rna | rna,DNA,rna,DNA,DNA*,DNA*,DNA,rna,rna,rna* | rna | DNA | rna | DNA | DNA* | DNA* | DNA | rna | rna | rna* | ||||
3 | AG*CT* | DNA DNA* DNA DNA* | DNA* DNA DNA* DNA | DNA* | DNA | DNA* | DNA | ||||||||||
4 | rArGrCrU | rna rna rna rna | rna rna rna rna | rna | rna | rna | rna | ||||||||||
5 | rA*rG*rC*rU*AGCT | rna* rna* rna* rna* DNA DNA DNA DNA | DNA DNA DNA DNA rna* rna* rna* rna* | DNA | DNA | DNA | DNA | rna* | rna* | rna* | rna* | ||||||
Normal, Thio, thRNA, thOMe |
Cell Formulas | ||
---|---|---|
Range | Formula | |
B2 | B2 | =TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"rA",",rna"),"rG",",rna"),"rC",",rna"),"rU",",rna"),"A",",DNA"),"G",",DNA"),"C",",DNA"),"T",",DNA"),"",)) |
C2 | C2 | =TEXTJOIN(",",,TRIM(MID(SUBSTITUTE($B2,",",REPT(" ",LEN(B2))),SEQUENCE(,LEN(B2)-LEN(SUBSTITUTE(B2,",","")),LEN(B2)*(LEN(B2)-LEN(SUBSTITUTE(B2,",",""))),-LEN(B2)),LEN(B2)))) |
D2:O2 | D2 | =IFERROR(MID(TRIM(SUBSTITUTE($C$2,","," ")),(SUMPRODUCT(LEN($C$2:C2)*1)-LEN($C$2)+1),FIND(" ",TRIM(SUBSTITUTE($C$2,","," "))&" ",SUMPRODUCT(LEN($C$2:C2)*1)-LEN($C$2)+2)-(SUMPRODUCT(LEN($C$2:C2)*1)-LEN($C$2)+1)),"") |
B3:B5 | B3 | =TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A3,"rA"," rna"),"rG"," rna"),"rC"," rna"),"rU"," rna"),"A"," DNA"),"G"," DNA"),"C"," DNA"),"T"," DNA"),"",)) |
C3:C5 | C3 | =TEXTJOIN(" ",,TRIM(MID(SUBSTITUTE(B3," ",REPT(" ",99)),((LEN(B3)-LEN(SUBSTITUTE(B3," ",""))+1)-ROW($XFD$1:INDEX(XFC:XFC,LEN(B3)-LEN(SUBSTITUTE(B3," ",""))+1)))*99+1,99))) |
D3:O3 | D3 | =IFERROR(MID(TRIM(SUBSTITUTE($C$3,","," ")),(SUMPRODUCT(LEN($C$3:C3)*1)-LEN($C$3)+1),FIND(" ",TRIM(SUBSTITUTE($C$3,","," "))&" ",SUMPRODUCT(LEN($C$3:C3)*1)-LEN($C$3)+2)-(SUMPRODUCT(LEN($C$3:C3)*1)-LEN($C$3)+1)),"") |
D4:O4 | D4 | =IFERROR(MID(TRIM(SUBSTITUTE($C$4,","," ")),(SUMPRODUCT(LEN($C$4:C4)*1)-LEN($C$4)+1),FIND(" ",TRIM(SUBSTITUTE($C$4,","," "))&" ",SUMPRODUCT(LEN($C$4:C4)*1)-LEN($C$4)+2)-(SUMPRODUCT(LEN($C$4:C4)*1)-LEN($C$4)+1)),"") |
D5:O5 | D5 | =IFERROR(MID(TRIM(SUBSTITUTE($C$5,","," ")),(SUMPRODUCT(LEN($C$5:C5)*1)-LEN($C$5)+1),FIND(" ",TRIM(SUBSTITUTE($C$5,","," "))&" ",SUMPRODUCT(LEN($C$5:C5)*1)-LEN($C$5)+2)-(SUMPRODUCT(LEN($C$5:C5)*1)-LEN($C$5)+1)),"") |