How to fetch the highest or successive highest values?

rajamdade

New Member
Joined
Jun 14, 2014
Messages
35
Office Version
  1. 2016
Platform
  1. Windows
Hello everyone,

Kindly help with the formula that could search with the given query id (column F) for the highest %Gene identity (column D), which could fetch the values for %gene identity (column I), the gene (column H) and relevant accession (column G). However, Query ID should match the highest % Gene identity only once.

For example, the query id: 27504.m000612, once matched with Raf29 having the overall highest %Gene identity (70.5%) will not be searched again eventhough it has higher %Gene identity for Raf21 (69.9%) than that of 27471.m000401 (Raf21: 68.9%). Now, the query ID: 27471.m000401, could be alloted with the following/successive highest (which is 2nd in this case) Raf21 (68.9%).

However, if the highest/following highest %Gene identity is exactly similar between two query IDs then it could indicate as conflict.

I have similar 300 query ids with such data. I would appreciate it if you could provide the formulas to fetch values for column G, H and I from the given query as in the column F, thanks in advance.

Mrexcel query.xlsx
ABCDEFGHI
1Query IDAccessionGene% Gene IdentityQuery IDAccessionGene% Gene Identity
227471.m000401AT4G35780Raf2969.127471.m000401AT2G17700Raf2168.9
327471.m000401AT2G17700Raf2168.927504.m000612AT4G35780Raf2970.5
427471.m000401AT4G38470Raf3066.327504.m000613AT4G38470Raf30Conflict
527504.m000612AT4G35780Raf2970.527894.m000774AT4G38470Raf30Conflict
627504.m000612AT2G17700Raf2169.9
727504.m000612AT3G06630Raf868.4
827504.m000612AT4G38470Raf3067.3
927504.m000613AT4G35780Raf2970
1027504.m000613AT2G17700Raf2167.4
1127504.m000613AT4G38470Raf3066.2
1227894.m000774AT4G38470Raf3066.2
1327894.m000774AT4G14780Raf2664.8
1427894.m000774AT4G35780Raf2963.7
Sheet1
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
Maybe something like this. I used Conditional Formatting to show the conflicts.

Book2
ABCDEFGHI
1Query IDAccessionGene% Gene IdentityQuery IDAccessionGene% Gene Identity
227471.m000401AT4G35780Raf2969.127504.m000612AT4G35780Raf2970.5
327471.m000401AT2G17700Raf2168.927471.m000401AT2G17700Raf2168.9
427471.m000401AT4G38470Raf3066.327504.m000613AT4G38470Raf3066.2
527504.m000612AT4G35780Raf2970.527894.m000774AT4G38470Raf3066.2
627504.m000612AT2G17700Raf2169.9
727504.m000612AT3G06630Raf868.4
827504.m000612AT4G38470Raf3067.3
927504.m000613AT4G35780Raf2970
1027504.m000613AT2G17700Raf2167.4
1127504.m000613AT4G38470Raf3066.2
1227894.m000774AT4G38470Raf3066.2
1327894.m000774AT4G14780Raf2664.8
1427894.m000774AT4G35780Raf2963.7
Sheet15
Cell Formulas
RangeFormula
F2:H5F2=INDEX(A:A,AGGREGATE(15,6,ROW($A$2:$A$14)/($D$2:$D$14=$I2),COUNTIF($I$2:$I2,$I2)))
I2:I5I2=AGGREGATE(14,6,D$2:D$14/(COUNTIFS($F$1:$F1,$A$2:$A$14)=0)/(COUNTIFS($H$1:$H1,$C$2:$C$14,$I$1:$I1,"<>"&$D$2:$D$14)=0),1)
Cells with Conditional Formatting
CellConditionCell FormatStop If True
I:ICell ValueduplicatestextNO
 
Upvote 0
Maybe something like this. I used Conditional Formatting to show the conflicts.

Book2
ABCDEFGHI
1Query IDAccessionGene% Gene IdentityQuery IDAccessionGene% Gene Identity
227471.m000401AT4G35780Raf2969.127504.m000612AT4G35780Raf2970.5
327471.m000401AT2G17700Raf2168.927471.m000401AT2G17700Raf2168.9
427471.m000401AT4G38470Raf3066.327504.m000613AT4G38470Raf3066.2
527504.m000612AT4G35780Raf2970.527894.m000774AT4G38470Raf3066.2
627504.m000612AT2G17700Raf2169.9
727504.m000612AT3G06630Raf868.4
827504.m000612AT4G38470Raf3067.3
927504.m000613AT4G35780Raf2970
1027504.m000613AT2G17700Raf2167.4
1127504.m000613AT4G38470Raf3066.2
1227894.m000774AT4G38470Raf3066.2
1327894.m000774AT4G14780Raf2664.8
1427894.m000774AT4G35780Raf2963.7
Sheet15
Cell Formulas
RangeFormula
F2:H5F2=INDEX(A:A,AGGREGATE(15,6,ROW($A$2:$A$14)/($D$2:$D$14=$I2),COUNTIF($I$2:$I2,$I2)))
I2:I5I2=AGGREGATE(14,6,D$2:D$14/(COUNTIFS($F$1:$F1,$A$2:$A$14)=0)/(COUNTIFS($H$1:$H1,$C$2:$C$14,$I$1:$I1,"<>"&$D$2:$D$14)=0),1)
Cells with Conditional Formatting
CellConditionCell FormatStop If True
I:ICell ValueduplicatestextNO
Thank you @Eric W for your answer. But, the expected output is somewhat different than what I have expected/shown... Please excuse me, let me explain this way...

For example, the query id: 27504.m000612 matches with Raf29 having the overall highest %Gene identity (70.5%). We don't want Raf29 to be recognized again for 27471.m000401 (which has a lower %Gene identity than 27504.m000612), so it will be matched to the following highest %Gene identity which is Raf21 (69.9%), this will exclude Raf29 in order to avoid duplication.
 
Upvote 0
Except for order, my results are the same as your example. I had assumed that the Raf21 (69.9%) entry was excluded because you wanted at most 1 gene per query. Is this not so?
 
Upvote 0
Except for order, my results are the same as your example. I had assumed that the Raf21 (69.9%) entry was excluded because you wanted at most 1 gene per query. Is this not so?
Yeah, I was able to resolve the issue, now I have added the formula for G and H columns also, kindly let me know whether the formula is fetching correctly (looks though...).

TEST.xlsx
ABCDEFGHI
1Query IDAccessionGene% Gene IdentityQuery IDAccessionGene% Gene Identity
227471.m000401AT4G35780Raf2969.127504.m000612AT4G35780Raf2970.5
327471.m000401AT2G17700Raf2168.927471.m000401AT2G17700Raf2168.9
427471.m000401AT4G38470Raf3066.327504.m000613AT4G38470Raf3066.2
527504.m000612AT4G35780Raf2970.527894.m000774AT4G38470Raf3066.2
627504.m000612AT2G17700Raf2169.9
727504.m000612AT3G06630Raf868.4
827504.m000612AT4G38470Raf3067.3
927504.m000613AT4G35780Raf2970
1027504.m000613AT2G17700Raf2167.4
1127504.m000613AT4G38470Raf3066.2
1227894.m000774AT4G38470Raf3066.2
1327894.m000774AT4G14780Raf2664.8
1427894.m000774AT4G35780Raf2963.7
Sheet1
Cell Formulas
RangeFormula
F2:F5F2=INDEX(A:A,AGGREGATE(15,6,ROW($A$2:$A$14)/($D$2:$D$14=$I2),COUNTIF($I$2:$I2,$I2)))
G2:G5G2=INDEX($B$2:$B$14,MATCH(F2&I2, $A$2:$A$14&$D$2:$D$14,0))
H2:H5H2=INDEX($C$2:$C$14,MATCH(2,1/($A$2:$A$14=F2)/($D$2:$D$14=I2)))
I2:I5I2=AGGREGATE(14,6,D$2:D$14/(COUNTIFS($F$1:$F1,$A$2:$A$14)=0)/(COUNTIFS($H$1:$H1,$C$2:$C$14,$I$1:$I1,"<>"&$D$2:$D$14)=0),1)
Press CTRL+SHIFT+ENTER to enter array formulas.
Named Ranges
NameRefers ToCells
_FilterDatabase=Sheet1!$A$1:$D$26F2:F5
 
Upvote 0
My original F2 formula was designed to work in columns G and H as well. I'd recommend sticking with that formula if the values in G and H can change. But in looking at your sample table, B11 = B12 and C11 = C12, so if that is consistent then your formulas should work as well.

In any case, I'm glad it works for you! Let us know if you have any other questions.
 
Upvote 0

Forum statistics

Threads
1,223,750
Messages
6,174,291
Members
452,554
Latest member
Louis1225

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top