Hello folks,
I have to calculate a frequency transition matrix in Excel. This is a Hidden Markov Model (HMM) problem.
I have the following inputs:
1. Multiple DNA sequences
ATGGGCTCCTCCGCCATCACCGTGAGCTTCCTCCTCTTTCTGGCATTTCAGCTCCCAGGGCAAACAGGAGCAAATCCCGTGTATGGCTCTGTGTCCAATG CAGACCTGATGGATTTCAAGTAAAAG ATGGTGAGAAAATGGGCCCTGCTCCTGCCCATGCTGCTCTGCGGCCTGACTGGTCCCGCACACCTCTTCCAGCCAAGCCTGGTGCTGGAGATGGCCCAGG TCCTCTTGGATAACTACTGCTTCCCAGAGAACCTGATGGGGATGCAGGGAGCCATCGAGCAGGCCATCAAAAGTCAGGAGATTCTGTCTATCTCAGACCC TCAGACTCTGGCCCATGTGTTGACAGCTGGGGTGCAGAGCTCCTTGAATGACCCTCGCCTGGTCATCTCCTATGAGCCCAGCACCCTCGAGGCCCCTCCG CGAGCTCCAGCAGTCACGAACCTCACACTAGAGGAAATCATCGCAGGGCTGCAGGA ATGCCCGACGCCACACTGCCCGCCTGCTTCCTCAGCCTGCTGGCCTTCACCTCTGCTTGCTACTTCCAGAACTGCCCAAGGGGCGGCAAGAGGGCCATGT CCGACCTGGAGCTGAGACAGTAGCAG
(Each sequence begins on a new line and begins with ATG).
2. State transition matrix
State A T G C End
0 1 -1 -1 -1 -1
1 -1 2 -1 -1 -1
2 -1 -1 3 -1 -1
3 4 7 6 5 -1
4 8 11 10 9 -1
5 8 11 10 9 -1
6 8 11 10 9 -1
7 12 11 13 9 -1
8 14 17 16 15 -1
9 14 17 16 15 -1
10 14 17 16 15 -1
11 14 17 16 15 -1
12 18 17 19 15 -1
13 18 17 16 15 -1
14 4 7 6 5 20
15 4 7 6 5 20
16 4 7 6 5 20
17 4 7 6 5 20
18 -1 -1 -1 -1 20
19 -1 -1 -1 -1 20
20 -1 -1 -1 -1 -1
The output that I need to get is a frequency transition matrix.
For example:
ATGGGCTCCT
A->T is state 1->2 ==> count:1
T->G is state 2->3 ==> count:1
G->G is state 3->6 ==> count:1
G->G is state 6->10==> count:1
G->C is state 10->15==>count:1
I have to compute this for each sequence.
State 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
1 1
2 1
3 1
4
5
6 1
7
8
9
10 1
11
12
13
14
15
16
17
18
19
20
Does anybody have an idea on how to proceed? I know how to do it manually but I have a large number of sequences and each sequence is extremely long!
Please help!
I have to calculate a frequency transition matrix in Excel. This is a Hidden Markov Model (HMM) problem.
I have the following inputs:
1. Multiple DNA sequences
ATGGGCTCCTCCGCCATCACCGTGAGCTTCCTCCTCTTTCTGGCATTTCAGCTCCCAGGGCAAACAGGAGCAAATCCCGTGTATGGCTCTGTGTCCAATG CAGACCTGATGGATTTCAAGTAAAAG ATGGTGAGAAAATGGGCCCTGCTCCTGCCCATGCTGCTCTGCGGCCTGACTGGTCCCGCACACCTCTTCCAGCCAAGCCTGGTGCTGGAGATGGCCCAGG TCCTCTTGGATAACTACTGCTTCCCAGAGAACCTGATGGGGATGCAGGGAGCCATCGAGCAGGCCATCAAAAGTCAGGAGATTCTGTCTATCTCAGACCC TCAGACTCTGGCCCATGTGTTGACAGCTGGGGTGCAGAGCTCCTTGAATGACCCTCGCCTGGTCATCTCCTATGAGCCCAGCACCCTCGAGGCCCCTCCG CGAGCTCCAGCAGTCACGAACCTCACACTAGAGGAAATCATCGCAGGGCTGCAGGA ATGCCCGACGCCACACTGCCCGCCTGCTTCCTCAGCCTGCTGGCCTTCACCTCTGCTTGCTACTTCCAGAACTGCCCAAGGGGCGGCAAGAGGGCCATGT CCGACCTGGAGCTGAGACAGTAGCAG
(Each sequence begins on a new line and begins with ATG).
2. State transition matrix
State A T G C End
0 1 -1 -1 -1 -1
1 -1 2 -1 -1 -1
2 -1 -1 3 -1 -1
3 4 7 6 5 -1
4 8 11 10 9 -1
5 8 11 10 9 -1
6 8 11 10 9 -1
7 12 11 13 9 -1
8 14 17 16 15 -1
9 14 17 16 15 -1
10 14 17 16 15 -1
11 14 17 16 15 -1
12 18 17 19 15 -1
13 18 17 16 15 -1
14 4 7 6 5 20
15 4 7 6 5 20
16 4 7 6 5 20
17 4 7 6 5 20
18 -1 -1 -1 -1 20
19 -1 -1 -1 -1 20
20 -1 -1 -1 -1 -1
The output that I need to get is a frequency transition matrix.
For example:
ATGGGCTCCT
A->T is state 1->2 ==> count:1
T->G is state 2->3 ==> count:1
G->G is state 3->6 ==> count:1
G->G is state 6->10==> count:1
G->C is state 10->15==>count:1
I have to compute this for each sequence.
State 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
1 1
2 1
3 1
4
5
6 1
7
8
9
10 1
11
12
13
14
15
16
17
18
19
20
Does anybody have an idea on how to proceed? I know how to do it manually but I have a large number of sequences and each sequence is extremely long!
Please help!