Advanced filtering inquiry - filtering represented data by levels in another column.

ihavequestions256 · Sep 13, 2017

Hello!

I've been bumbling around on google trying to sort this out myself, but I am struggling so here goes.

I have this large dataset for analysis that has several columns (subject, age, gender, protein, peptide, intensity). I've just made a minor representative subset below. I'm trying to sort/filter the data such that only the proteins with more than one level in the peptide column are shown. Below for example if I set the limit as 2 peptides or above, then protein ABC would be excluded as there's only one peptide listed (though I would like to be able to freely choose my threshold).

[TABLE="width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5[/TD]
[/TR]
</tbody>[/TABLE]

I'm able to make a pivot table which shows me how many unique peptides there are, but I'm lost from that point. The dataset is so large there are literally 1000 proteins that have only one peptide, and I'm sure there's a smarter way to do this than for me to deselect each of these in the data "sort" tool. Any help would be greatly appreciated!

Gerald Higgins · Sep 13, 2017

Hi, welcome to the board.

What I would do is use a helper column (which can be hidden if required) to count the number of values for each protein, perhaps like this

=countif(A$2:A$5,A2)

This assumes the Proteins are in column A, with the first instance of XYZ in cell A2.

Copy this formula all the way down the column, which should give you data something like this . . .

[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity.............. Count[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24....................... 2[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3.................... 2 [/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25....................... 1[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23....................... 2[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5..................... 2[/TD]
[/TR]
</tbody>[/TABLE]
Then it's a simple matter to use Data Filter, to filter out anything with a count less than 2, or whatever threshold you decide.

ihavequestions256 · Sep 13, 2017

Gerald Higgins said:
Hi, welcome to the board.

What I would do is use a helper column (which can be hidden if required) to count the number of values for each protein, perhaps like this

=countif(A$2:A$5,A2)

This assumes the Proteins are in column A, with the first instance of XYZ in cell A2.

Copy this formula all the way down the column, which should give you data something like this . . .

[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity.............. Count[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24....................... 2[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3.................... 2[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25....................... 1[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23....................... 2[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5..................... 2[/TD]
[/TR]
</tbody>[/TABLE]
Then it's a simple matter to use Data Filter, to filter out anything with a count less than 2, or whatever threshold you decide.

It's nice to join such a helpful community. I am very grateful for the assistance. This has worked so well.

ihavequestions256 · Sep 13, 2017

So I'm having a bit of difficulty with applying it completely...is there a function that might use less processing? The dataset I'm using is quite large (~300,000 x 6) and trying to filter with a conditional function is basically crashing excel everytime when I try it on the full dataset. Granted I've only given it ~15 minutes for it to be in the "not responding" state, but even with all 4 processors going it is having a very hard time. Thanks again.

Advanced filtering inquiry - filtering represented data by levels in another column.

ihavequestions256

New Member

Excel Facts

Gerald Higgins

Well-known Member

ihavequestions256

New Member

ihavequestions256

New Member

Similar threads

Forum statistics

Share this page

Advanced filtering inquiry - filtering represented data by levels in another column.

ihavequestions256

New Member

Excel Facts

Gerald Higgins

Well-known Member

ihavequestions256

New Member

ihavequestions256

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock