Filter large amount of data by multiple constraints

Dennisss11

New Member
Joined
Nov 25, 2022
Messages
9
Office Version
  1. 365
Platform
  1. MacOS
Hi all,

Im working on an excel problem that I can't seem to figure out. Hopefully someone knows if what I want is possible... :)

I am trying to filter option data. Every row has unique data in other columns, however for simplicity I have hid other columns that are not relevant for the filtering. Every entry has 3 important characteristics:
1. Date (only the date matters, time is all the same)
2. Put or Call
3. Delta (for put: -1 to 0, for call 0 to +1)

We are trying to backtest a strategy that uses a combination of a put and a call on the same date. The specific put and call for each date should be filtered closest to a specific value (cell G3 and U3).

Since the data is not perfect, sometimes there is only a put available on a specific day, or only a call. If a match on a specific available date can't be found, it should not be included in the filtered data.

Below is the mini-sheet. It does not contain all the data. Columns A, G and U go down up to around 450k rows. My thought is to use a formula in row AE that indicates which rows should stay. After that filtering should be easy with the basic filter option in excel.

If anyone knows how, I would be very thankful!!

All Sell 1DTE.xlsx
AGUAEAFAGAHAIAJ
1
2PutCall
3Preferred Delta (closest to)-0.20.2
4
5
6Data NOWPreferred filtered data
7
8
9DatePut/CallDeltaDatePut/CallDeltaSet
102015-07-23 15:45:00.000C0.999942015-07-23 15:45:00.000P-0.22818Match
112015-07-23 15:45:00.000P02015-07-23 15:45:00.000C0.23363
122015-07-23 15:45:00.000C0.999942015-01-29 15:45:00.000P-0.01811Match
132015-07-23 15:45:00.000P02015-01-29 15:45:00.000C0.1816
142015-07-23 15:45:00.000C0.99994
152015-07-23 15:45:00.000P0
162015-07-23 15:45:00.000C0.99994
172015-07-23 15:45:00.000P0
182015-07-23 15:45:00.000C0.99994
192015-07-23 15:45:00.000P0
202015-07-23 15:45:00.000C0.99994
212015-07-23 15:45:00.000P0
222015-07-23 15:45:00.000C0.99994
232015-07-23 15:45:00.000P0
242015-07-23 15:45:00.000C0.99994
252015-07-23 15:45:00.000P0
262015-07-23 15:45:00.000C0.99994
272015-07-23 15:45:00.000P0
282015-07-23 15:45:00.000C0.99994
292015-07-23 15:45:00.000P0
302015-07-23 15:45:00.000C0.99994
312015-07-23 15:45:00.000P0
322015-07-23 15:45:00.000C0.99994
332015-07-23 15:45:00.000P0
342015-07-23 15:45:00.000C0.99994
352015-07-23 15:45:00.000P0
362015-07-23 15:45:00.000C0.99994
372015-07-23 15:45:00.000P0
382015-07-23 15:45:00.000C0.99994
392015-07-23 15:45:00.000P0
402015-07-23 15:45:00.000C0.99994
412015-07-23 15:45:00.000P0
422015-07-23 15:45:00.000C0.99994
432015-07-23 15:45:00.000P0
442015-07-23 15:45:00.000C0.99994
452015-07-23 15:45:00.000P0
462015-07-23 15:45:00.000C0.99994
472015-07-23 15:45:00.000P0
482015-07-23 15:45:00.000C0.99994
492015-07-23 15:45:00.000P0
502015-07-23 15:45:00.000C0.99994
512015-07-23 15:45:00.000P0
522015-07-23 15:45:00.000C0.99994
532015-07-23 15:45:00.000P0
542015-07-23 15:45:00.000C0.99994
552015-07-23 15:45:00.000P0
562015-07-23 15:45:00.000C0.99994
572015-07-23 15:45:00.000P0
582015-07-23 15:45:00.000C0.99994
592015-07-23 15:45:00.000P0
602015-07-23 15:45:00.000C0.99994
612015-07-23 15:45:00.000P0
622015-07-23 15:45:00.000C0.99994
632015-07-23 15:45:00.000P0
642015-07-23 15:45:00.000C0.99994
652015-07-23 15:45:00.000P0
662015-07-23 15:45:00.000C0.99994
672015-07-23 15:45:00.000P0
682015-07-23 15:45:00.000C0.99994
692015-07-23 15:45:00.000P0
702015-07-23 15:45:00.000C0.99994
712015-07-23 15:45:00.000P0
722015-07-23 15:45:00.000C0.99994
732015-07-23 15:45:00.000P0
742015-07-23 15:45:00.000C0.99994
752015-07-23 15:45:00.000P0
762015-07-23 15:45:00.000C0.99994
772015-07-23 15:45:00.000P0
782015-07-23 15:45:00.000C0.99994
792015-07-23 15:45:00.000P0
All Sell 1DTE
 

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
I may be a little dense here, but I cannot see how your preferred data example is reflective of your data in columns A, G and U. Please clarify.
 
Upvote 0
I may be a little dense here, but I cannot see how your preferred data example is reflective of your data in columns A, G and U. Please clarify.
No worries! Thanks for taking the time. Unfortunately I can't make a larger mini sheet as it is limited to 3000 cells. However, I have made a screenshot that hopefully clarifies.

The only thing I did now is hid some rows. As you scroll down in the data, you will notice the values that will go in the "Preferred filtered data". I highlighted the four results. That is, 2 results per day.

So as you see, from row 10 to 282, it is all 23-07-2015. That is one day. For that one day, I need the Put entry closest to -0.2 and the Call entry closest to 0.2 (cells G3 and U3).

The formula should:
1. Check lines of the same date
2. Find the entry of that date with "P" (column G) and closest to -0.2 (column U)
3. Find the entry of that date with "C" (column G) and closest to 0.2 (column U
4. If the formula found an entry for both P and C, it should return an indication that these 2 rows can "stay". All other rows from that date are not needed anymore.
5. If it does that for every date, it should give me the "Preferred filtered data" as shown in column AG to AJ.

Hope this clarifies. Thanks in advance!

Screenshot 2024-01-21 at 23.25.37.png
 
Upvote 0
Unfortunately I can't make a larger mini sheet as it is limited to 3000 cells.
What you could do is
  • Make a copy of the worksheet & and delete all but a couple of the currently hidden columns.
  • Delete many of the rows for each day so there is still just a few per day but still enough to demonstrate what you want.
  • Include a few more days and ensure there is an example in relation to your point 4 above and explain that again.
  • Post that and the desired results with XL2BB
 
Upvote 0
What you could do is
  • Make a copy of the worksheet & and delete all but a couple of the currently hidden columns.
  • Delete many of the rows for each day so there is still just a few per day but still enough to demonstrate what you want.
  • Include a few more days and ensure there is an example in relation to your point 4 above and explain that again.
  • Post that and the desired results with XL2BB
Hi Peter,

Thanks for taking the time! I have made a mini-sheet per your suggestion. Had not thought about that. Please see the sheet below.

Hopefully this makes it easier to understand what I am looking for. I have highlighted two dates where the data is inconclusive. For example, 07/25/2015 only has a Put for that date and 07/27/2015 only has a Call available for that date. Since I need a put and a call, that date should not be labeled "Keep". For dates where there is at least one call and one put available, it should find the ones closest to the values in AO6 and AP6, for puts and calls respectively.

To be clear: if I would have a formula that generates the "Keep" suggestion in column AQ, it would solve the problem!

Thanks in advance !


All Sell 1DTE.xlsx
AMANAOAPAQARASATAU
4
5PutCall
6Preferred Delta (closest to) -0.20.2
7
8Raw data, "checked" Filtered by "Keep" (Column AQ)
9DatePut/CallDeltaCheckDatePut/CallDelta
107/23/15C0.447/23/15C0.21
117/23/15P-0.447/23/15P-0.21
127/23/15C0.337/24/15C0.19
137/23/15P-0.337/24/15P-0.19
147/23/15C0.21Keep7/26/15C0.4
157/23/15P-0.21Keep7/26/15P-0.4
167/23/15C0.11
177/23/15P-0.11
187/24/15C0.23
197/24/15P-0.23
207/24/15C0.19Keep
217/24/15P-0.19Keep
227/24/15C0.99994
237/25/15P-0.21
247/26/15C0.4Keep
257/26/15P-0.4Keep
267/27/15C0.19
27
28
29
Sheet1
 
Last edited:
Upvote 0
Thanks for the simplified data - much easier to follow. :)

Try this. One further thing to consider though.
Is there an issue if there are duplicate "closest" values? For example if AP12 was 0.21 or 0.19 then AQ12 and AQ14 would both be "Keep" for "C" on 23 July

Dennisss11.xlsm
ANAOAPAQ
4
5PutCall
6Preferred Delta (closest to) -0.20.2
7
8Raw data, "checked"
9DatePut/CallDeltaCheck
1023/07/2015C0.44 
1123/07/2015P-0.44 
1223/07/2015C0.33 
1323/07/2015P-0.33 
1423/07/2015C0.21Keep
1523/07/2015P-0.21Keep
1623/07/2015C0.11 
1723/07/2015P-0.11 
1824/07/2015C0.23 
1924/07/2015P-0.23 
2024/07/2015C0.19Keep
2124/07/2015P-0.19Keep
2224/07/2015C0.99994 
2325/07/2015P-0.21 
2426/07/2015C0.4Keep
2526/07/2015P-0.4Keep
2627/07/2015C0.19 
Sheet2
Cell Formulas
RangeFormula
AQ10:AQ26AQ10=IF(COUNTIFS(AN$10:AN$26,AN10,AO$10:AO$26,IF(AO10="C","P","C")),IF(ROUND(ABS(AP10-IF(AO10="C",AP$6,AO$6)),9)=AGGREGATE(15,6,ROUND(ABS(AP$10:AP$26-IF(AO10="C",AP$6,AO$6)),9)/((AN$10:AN$26=AN10)*(AO$10:AO$26=AO10)),1),"Keep",""),"")
 
Upvote 0
Solution
Thanks for the simplified data - much easier to follow. :)

Try this. One further thing to consider though.
Is there an issue if there are duplicate "closest" values? For example if AP12 was 0.21 or 0.19 then AQ12 and AQ14 would both be "Keep" for "C" on 23 July

Dennisss11.xlsm
ANAOAPAQ
4
5PutCall
6Preferred Delta (closest to) -0.20.2
7
8Raw data, "checked"
9DatePut/CallDeltaCheck
1023/07/2015C0.44 
1123/07/2015P-0.44 
1223/07/2015C0.33 
1323/07/2015P-0.33 
1423/07/2015C0.21Keep
1523/07/2015P-0.21Keep
1623/07/2015C0.11 
1723/07/2015P-0.11 
1824/07/2015C0.23 
1924/07/2015P-0.23 
2024/07/2015C0.19Keep
2124/07/2015P-0.19Keep
2224/07/2015C0.99994 
2325/07/2015P-0.21 
2426/07/2015C0.4Keep
2526/07/2015P-0.4Keep
2627/07/2015C0.19 
Sheet2
Cell Formulas
RangeFormula
AQ10:AQ26AQ10=IF(COUNTIFS(AN$10:AN$26,AN10,AO$10:AO$26,IF(AO10="C","P","C")),IF(ROUND(ABS(AP10-IF(AO10="C",AP$6,AO$6)),9)=AGGREGATE(15,6,ROUND(ABS(AP$10:AP$26-IF(AO10="C",AP$6,AO$6)),9)/((AN$10:AN$26=AN10)*(AO$10:AO$26=AO10)),1),"Keep",""),"")

Hi Peter,

Thank you very much. It works perfectly. Luckily, in the data there are no duplicates that are close to the stated values. So no issues there. I am trying to understand what you did in the formula, but I am afraid I never will ... haha.
 
Upvote 0
Thank you very much. It works perfectly.
Good news and 'you are welcome' :)

Luckily, in the data there are no duplicates that are close to the stated values. So no issues there.
More good news!

I am trying to understand what you did in the formula,
I'll give it a bit of a go

=IF(COUNTIFS(AN$10:AN$26,AN10,AO$10:AO$26,IF(AO10="C","P","C")),IF(ROUND(ABS(AP10-IF(AO10="C",AP$6,AO$6)),9)=AGGREGATE(15,6,ROUND(ABS(AP$10:AP$26-IF(AO10="C",AP$6,AO$6)),9)/((AN$10:AN$26=AN10)*(AO$10:AO$26=AO10)),1),"Keep",""),"")

Green part checks that if the row is a "P", then a "C" for the same date also exists, & vice-versa.
Purple part gets the absolute (= positive) difference between the Delta value on the formula row and the relevant value from row 6. (The round part is because sometimes the calculations in Excel are a tiny bit out & rounding helps make comparisons later more reliable)
Red part calculates the smallest absolute (=positive) difference for all Delta values that have the same date as the formula row and the same C/P value as the formula row. Same issue re rounding
If the purple part and red part are equal then "Keep" otherwise ""
 
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top