ihavequestions256
New Member
- Joined
- Sep 13, 2017
- Messages
- 7
Hello!
I tried a prior thread which alas did not fully work out (more on this below), but I am still struggling so here goes.
I have this large dataset for analysis that has several columns (subject, age, gender, protein, peptide, intensity). I've just made a minor representative subset below. I'm trying to sort/filter the data such that only the proteins with more than one level in the peptide column are shown. Below for example if I set the limit as 2 peptides or above, then protein ABC would be excluded as there's only one peptide listed (though I would like to be able to freely choose my threshold).
[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5[/TD]
[/TR]
</tbody>[/TABLE]
I'm able to make a pivot table which shows me how many unique peptides there are, but I'm lost from that point. The dataset is so large there are literally 1000 proteins that have only one peptide, and I'm sure there's a smarter way to do this than for me to deselect each of these in the data "sort" tool. In my prior thread someone was very helpful and suggested I use the "countif" function to count the number of rows for each protein and then sort based off that.
I had initially thought it worked, but then realized when I tried to sort after using that conditional function, excel just freezes on me (~300,000 rows x 6 columns of data to sort with a conditional function). Excel automatically jumped to using 4 processors to no avail.
Any extra suggestions would be greatly appreciated!
I tried a prior thread which alas did not fully work out (more on this below), but I am still struggling so here goes.
I have this large dataset for analysis that has several columns (subject, age, gender, protein, peptide, intensity). I've just made a minor representative subset below. I'm trying to sort/filter the data such that only the proteins with more than one level in the peptide column are shown. Below for example if I set the limit as 2 peptides or above, then protein ABC would be excluded as there's only one peptide listed (though I would like to be able to freely choose my threshold).
[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5[/TD]
[/TR]
</tbody>[/TABLE]
I'm able to make a pivot table which shows me how many unique peptides there are, but I'm lost from that point. The dataset is so large there are literally 1000 proteins that have only one peptide, and I'm sure there's a smarter way to do this than for me to deselect each of these in the data "sort" tool. In my prior thread someone was very helpful and suggested I use the "countif" function to count the number of rows for each protein and then sort based off that.
I had initially thought it worked, but then realized when I tried to sort after using that conditional function, excel just freezes on me (~300,000 rows x 6 columns of data to sort with a conditional function). Excel automatically jumped to using 4 processors to no avail.
Any extra suggestions would be greatly appreciated!