Complex Formula Algorithm (Variable size array created from index/match/adress returns)

XionicFire

New Member
Joined
Jan 22, 2016
Messages
49
Office Version
  1. 365
Platform
  1. Windows
Hi, im not sure how to explain the situation so bear with me and thank you for looking,


Im trying to create something extremely complex, ive searched over the internet cant find anything similar, so i decided to make it in parts, dissasemble the formula into 6 different parts, then find each part individually on the internet, for the most part i had good success until i got to the point where i need to create a variable sized and location array, ill explain more in formula format:


Ok so basically goes something like this:


Im trying to create a data parser to parse an xml dump file in order to process all the data, search for specific variables in the table, then within the results of that table, search for specific strings, and then return the values of those strings on another page, effectively delivering a clean "output" so to speak without all the extra useless XML code so that the info can then be fed into a spreadsheet for other uses.


The original data in the XML table looks something like this:


ID lvar Data
65464 type B002344
65464 flags 2131898
65464 brand Airlink101
65464 model AR504
65464 tr69 Yes
235422 type C002311
235422 flags 33123144
235422 brand BKTR
235422 model BK20031


The ID area is basically the unique unit or product identifier, and the lvars are the products different features, data is the relevant data from those features.


So essentially what want it to do is:


First search the entire file on column A for Similar ID Numers, from that somehow parse and identify them into one array (say it found 10 entries contanining the same ID number in range $:R37:R48), the lvars are not always the same, some have more and some have less so a match+10 from the first id "hit" wouldnt work, essentially it needs to generate an array that varies in size from product id to id, (and if possible that can fetch info even if its not sequential (random order), IE $R37:$R40 - $R44:$R7), once thats done (and this is the part ive already got working and nailed down with the following formula,=INDEX($D:$H,MATCH($C15,!$H:$H,0),MATCH(G$12,$D$12:$H$12,0)) it needs to find the data assigned to every lvar string and parse it in a different format in another worksheet.


This part as i said i already got working fine, the problem im getting is with the original parsing of ids, ive tried several methods to no avail, the most promising method ive found so far (but sadly is vulnerable to corruption if the data is not sorted correctly(randomly ordered), which i can fix with a quick pre-sorting before pasting the info in the worksheet) is to use basically the same formula i use to parse the lvars but instead of telling it to search the whole array i tell it to search from the first match found of the ids to the last using these 2 formulas:


This finds me the first cell where the ID was seen: =VLOOKUP(E12,!$D:$D,1,0)
And this finds me the last cell: =INDEX($D:$D,MATCH(E12,$D:$D,0),1)


From that i could theoretically construct a "variable sized and location array" however every method ive tried to extract the exact adress location from those formulas has failed.


If i try: =ADDRESS(MATCH(E16,D12:D34,0),1,1) I get something like $A$5 even tho its blatantly obvious the data is somwhere around $R50 and nowhere near $A$5, the match adress return seems to be relative to the location inside the array and not the entire worksheet.

Trying to select the entire worksheet as a search area only returns a N/A Error so thats also out of the question :/


So basically theres my two problems, if anyone has a good idea how to do this, or simply how to get a simple correct worksheet relative adress return for the information im trying to retrieve i would be most apreciated, with the latter i should be able to devise a solution for the variable array!


Thanks for the responses in advance!
 
Oh im sorry i did not understand what you meant, i thought you meant something entirely different, if i understand you correctly:

In your terms D would be:

210
DAX
DDSR31TT21

Text & real numbers

G would be:

DDSR31TT21
JAD
blank
NAD
210
'200

Text, Real Numbers & Text Numbers

H would be:

DDSR31TT21
JAD
blank
NAD
210
'200

Text, Real Numbers & Text Numbers

Is that the info you required?

(sorry again for the noobishness, stil llearning the forum terms)


Thanks. It's rather Excelese, not a forum language.

What is the name of the sheet housing the data?
 
Upvote 0

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).
Ah, i really need to learn the lingo haha

'Stage 1 - Raw Data Dump'

Would be the name of the page housing the data.
 
Upvote 0
Ah, i really need to learn the lingo haha

'Stage 1 - Raw Data Dump'

Would be the name of the page housing the data.

Define LTrow using the following recipe:
Activate Formulas | Name Manager.
Activate the New tab.
Enter the following in the Name box:

LTrow

Enter the following in the Refers to box.

=MATCH(REPT("z",255),'Stage 1 - Raw Data Dump'!$G:$G)

Click OK.

Define NLrow as referring to:

=MATCH(9.99999999999999E+307,'Stage 1 - Raw Data Dump'!$G:$G)

Define Lrow as referring to:

=MAX(IF(ISNUMBER(LTrow),LTrow,0),IF(ISNUMBER(LNrow),LNrow,0))

Now define Drange the range in D (or a more convenient name) as referring to:

='Stage 1 - Raw Data Dump'!$D$56:INDEX('Stage 1 - Raw Data Dump'!$D:$D,Lrow)

Define Grange the range in G (or a more convenient name) as referring to:

='Stage 1 - Raw Data Dump'!$G$56:INDEX('Stage 1 - Raw Data Dump'!$G:$G,Lrow)

Define Hrange the range in H (or a more convenient name) as referring to:

='Stage 1 - Raw Data Dump'!$H$56:INDEX('Stage 1 - Raw Data Dump'!$H:$H,Lrow)

Now invoke whichever is the most appropriate:

=IFERROR(INDEX(Grange,1/(1/MATCH(D$12&$C14,INDEX(Drange&Hrange,0),0))),"")

=IFERROR(INDEX(Grange,MATCH(D$12&$C14,INDEX(Drange&Hrange,0),0)),"")

The foregoing is fully dynamic, that is, the formulas adjust automatically to modifications to the data area.
 
Upvote 0
Got it! let me try them out, they seem preety good. ill test and let you know how it works

ATM i ran into a snag where having 17800 individual ids seached over 678000 records causes my computer to just give up, im using one of the servers at work to process the sheet but its still taking way too long, i might have to cut it down to 5000id chunks at a time.

Ill test this formula and see if theres improvement on the speed side as well, im guessing there will be.

Thanks for the assistance!
 
Upvote 0
Got it! let me try them out, they seem preety good. ill test and let you know how it works

ATM i ran into a snag where having 17800 individual ids seached over 678000 records causes my computer to just give up, im using one of the servers at work to process the sheet but its still taking way too long, i might have to cut it down to 5000id chunks at a time.

Ill test this formula and see if theres improvement on the speed side as well, im guessing there will be.

Thanks for the assistance!

We might want to eliminate the IFERROR call for more speed. In which cell are you entering the formula with IFERROR?
 
Upvote 0
well to be honest its only 157 x 16,780 but those cells reference a database of 648,750 x 390
 
Upvote 0
well to be honest its only 157 x 16,780 but those cells reference a database of 648,750 x 390

In I56 enter and copy down:

=D56&H56

Now define Irange as referring to:

='Stage 1 - Raw Data Dump'!$I$56:INDEX('Stage 1 - Raw Data Dump'!$I:$I,Lrow)

Now invoke simply:

Either...

=IFNA(INDEX(Grange,1/(1/MATCH(D$12&$C14,Irange,0))),"")

Or...

=IFNA(INDEX(Grange,MATCH(D$12&$C14,Irange,0)),"")

whichever the most appropriate.

This modification of the set up sacrifices memory against speed.




 
Last edited:
Upvote 0
In I56 enter and copy down:

=D56&H56

Now define Irange as referring to:

='Stage 1 - Raw Data Dump'!$I$56:INDEX('Stage 1 - Raw Data Dump'!$I:$I,Lrow)

Now invoke simply:

Either...

=IFNA(INDEX(Grange,1/(1/MATCH(D$12&$C14,Irange,0))),"")

Or...

=IFNA(INDEX(Grange,MATCH(D$12&$C14,Irange,0)),"")

whichever the most appropriate.

This modification of the set up sacrifices memory against speed.







Hi Aladin,

Ive been fiddling around with the formulas testing them out, i have 2 questions:

1, you say to choose which one of the two formulas is more apropriate, but im not sure what the difference is between them, what does the 1/(1/ do vs not having that string in the formula?

2, you say the new setup sacrifices memory against speed, what does that mean? it will take longer to load the formula but less to process? or just means the paging file will now be a monster on the machine running it but nothing else? we got around 32gb on the main server, it "should" be enough memory.

or does it not have anything to do with it?

sorry for the asking so many questions I just want to make sure i dont break something in the sheet. its happened with sheets this size that you can get to a point where you do something, save and now the sheet becomes unuseable forever as excel can no longer open a document that size, effectively "bricking" the excel sheet and forcing me to restore a backup.
 
Upvote 0
Hi Aladin,

Ive been fiddling around with the formulas testing them out, i have 2 questions:

1, you say to choose which one of the two formulas is more apropriate, but im not sure what the difference is between them, what does the 1/(1/ do vs not having that string in the formula?

The formulas to choose from should have been:

(a)

=IFERROR(1/(1/INDEX(Grange,MATCH(D$12&$C14,irange,0))),"")

(b)

=IFNA(INDEX(Grange,MATCH(D$12&$C14,irange,0)),"")

I think (b) is clear: It just traps #N/A errors.

However, (a) picks out #N/A errors as well as zero value or blank value results.

Non-zero result case:

1/(1/(INDEX(...)) [ Let's suppose INDEX(...) = 4 ]

>>

1/(1/4)

>>

1/0.25

>> 4

Zero result case:

1/(1/(INDEX(...)) [ Let's suppose INDEX(...) = 0 ]

>>

1/(1/0)

>>

1/#DIV/0!

>>

#DIV/0!

The outer IFERROR triggered by this error returns a blank.

2, you say the new setup sacrifices memory against speed, what does that mean? it will take longer to load the formula but less to process? or just means the paging file will now be a monster on the machine running it but nothing else? we got around 32gb on the main server, it "should" be enough memory.

or does it not have anything to do with it?

An additional range with (simple) formulas consumes memory, but they allow us to avoid invoking formulas that requires more time to calculate. So, we are spending some cell memory in order to improve our processing speed.

sorry for the asking so many questions I just want to make sure i dont break something in the sheet. its happened with sheets this size that you can get to a point where you do something, save and now the sheet becomes unuseable forever as excel can no longer open a document that size, effectively "bricking" the excel sheet and forcing me to restore a backup.

Testing can be exasperating indeed.
 
Upvote 0

Forum statistics

Threads
1,223,903
Messages
6,175,289
Members
452,631
Latest member
a_potato

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top