Advanced filtering inquiry - filtering represented data by levels in another column.

ihavequestions256

New Member
Joined
Sep 13, 2017
Messages
7
Hello!

I've been bumbling around on google trying to sort this out myself, but I am struggling so here goes.

I have this large dataset for analysis that has several columns (subject, age, gender, protein, peptide, intensity). I've just made a minor representative subset below. I'm trying to sort/filter the data such that only the proteins with more than one level in the peptide column are shown. Below for example if I set the limit as 2 peptides or above, then protein ABC would be excluded as there's only one peptide listed (though I would like to be able to freely choose my threshold).

[TABLE="width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5[/TD]
[/TR]
</tbody>[/TABLE]

I'm able to make a pivot table which shows me how many unique peptides there are, but I'm lost from that point. The dataset is so large there are literally 1000 proteins that have only one peptide, and I'm sure there's a smarter way to do this than for me to deselect each of these in the data "sort" tool. Any help would be greatly appreciated!
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Hi, welcome to the board.

What I would do is use a helper column (which can be hidden if required) to count the number of values for each protein, perhaps like this

=countif(A$2:A$5,A2)

This assumes the Proteins are in column A, with the first instance of XYZ in cell A2.

Copy this formula all the way down the column, which should give you data something like this . . .


[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity.............. Count[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24....................... 2[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3.................... 2 [/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25....................... 1[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23....................... 2[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5..................... 2[/TD]
[/TR]
</tbody>[/TABLE]
Then it's a simple matter to use Data Filter, to filter out anything with a count less than 2, or whatever threshold you decide.
 
Last edited:
Upvote 0
Hi, welcome to the board.

What I would do is use a helper column (which can be hidden if required) to count the number of values for each protein, perhaps like this

=countif(A$2:A$5,A2)

This assumes the Proteins are in column A, with the first instance of XYZ in cell A2.

Copy this formula all the way down the column, which should give you data something like this . . .


[TABLE="class: cms_table, width: 500"]
<tbody>[TR]
[TD]Protein[/TD]
[TD]Peptide[/TD]
[TD]Intensity.............. Count[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ STRPZW[/TD]
[TD]24....................... 2[/TD]
[/TR]
[TR]
[TD]XYZ[/TD]
[TD]XYZ SQERT[/TD]
[TD]24.3.................... 2[/TD]
[/TR]
[TR]
[TD]ABC[/TD]
[TD]ABC DEZFEF[/TD]
[TD]25....................... 1[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS FLEMP[/TD]
[TD]23....................... 2[/TD]
[/TR]
[TR]
[TD]QRS[/TD]
[TD]QRS BESOE[/TD]
[TD]23.5..................... 2[/TD]
[/TR]
</tbody>[/TABLE]
Then it's a simple matter to use Data Filter, to filter out anything with a count less than 2, or whatever threshold you decide.

It's nice to join such a helpful community. I am very grateful for the assistance. This has worked so well.
 
Upvote 0
So I'm having a bit of difficulty with applying it completely...is there a function that might use less processing? The dataset I'm using is quite large (~300,000 x 6) and trying to filter with a conditional function is basically crashing excel everytime when I try it on the full dataset. Granted I've only given it ~15 minutes for it to be in the "not responding" state, but even with all 4 processors going it is having a very hard time. Thanks again.
 
Upvote 0

Forum statistics

Threads
1,224,766
Messages
6,180,846
Members
453,001
Latest member
coulombevin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top