martingaleh
Board Regular
- Joined
- Jul 18, 2011
- Messages
- 83
I have a few general questions on setting up a model
I normally believe in modularity in programming. So I'd make a module for the connection, one for the table, and then I'd aggregate all the tables into one big table for pivoting purposes. However, I've learned in power query, if you do it that way, you'd pull the data once for each query, thus pulling it three or for times before the modelling even begins. Is this possibly true? Do people aggregate all of their staging and etl steps into one big script?
Further, I usually make one date dimension table for my custom calendar. In other bi tools, you can connect that same dimension multiple times, each one with a different alias, so for instance I'd have invoice date and due date keys in the fact table connected to the date dimension twice with different aliases. You can do this in access too, for example. Is that how people do it in power pivot? Is it better to just join the date dimensions right into the fact table twice, once for due date and once for invoice date?
Page 61 of the Analyzing data with microsoft power bi suggests activating different relationships when necessary. This doesn't work when you need to make a waterfall (i.e. a table with invoice data on the left and due date on top that shows when you get money for the invoices you issue)
Finally, I had been making my models in different excel books like one for ar, one for the forecast and then when I need to compare them, I'd make yet another book that connected to those books and brought the data down. I've learned you can't connect to the data model directly, you have to materialize the denormalized table on a worksheet (which is amazing to me) and then connect. Then doing so mean if I have a model like this:
Big model.xls
--->staging table 1---> staging table 2 in ar.xls
--->ar company 1.xls
union
--->ar company 2.xls
--->staging table 1---> staging table 2 in cash.xls
--->cash company1.xls
union
--->cash company2.xls
That I would have to hand update every single xls going up the chain to get my big model up to date. Worse, if I miss one update, I'll never know. There must be a more intelligent setup where the different stages of the data can be contained in the different workbooks and then everythign updates when the report is run, is there?
I normally believe in modularity in programming. So I'd make a module for the connection, one for the table, and then I'd aggregate all the tables into one big table for pivoting purposes. However, I've learned in power query, if you do it that way, you'd pull the data once for each query, thus pulling it three or for times before the modelling even begins. Is this possibly true? Do people aggregate all of their staging and etl steps into one big script?
Further, I usually make one date dimension table for my custom calendar. In other bi tools, you can connect that same dimension multiple times, each one with a different alias, so for instance I'd have invoice date and due date keys in the fact table connected to the date dimension twice with different aliases. You can do this in access too, for example. Is that how people do it in power pivot? Is it better to just join the date dimensions right into the fact table twice, once for due date and once for invoice date?
Page 61 of the Analyzing data with microsoft power bi suggests activating different relationships when necessary. This doesn't work when you need to make a waterfall (i.e. a table with invoice data on the left and due date on top that shows when you get money for the invoices you issue)
Finally, I had been making my models in different excel books like one for ar, one for the forecast and then when I need to compare them, I'd make yet another book that connected to those books and brought the data down. I've learned you can't connect to the data model directly, you have to materialize the denormalized table on a worksheet (which is amazing to me) and then connect. Then doing so mean if I have a model like this:
Big model.xls
--->staging table 1---> staging table 2 in ar.xls
--->ar company 1.xls
union
--->ar company 2.xls
--->staging table 1---> staging table 2 in cash.xls
--->cash company1.xls
union
--->cash company2.xls
That I would have to hand update every single xls going up the chain to get my big model up to date. Worse, if I miss one update, I'll never know. There must be a more intelligent setup where the different stages of the data can be contained in the different workbooks and then everythign updates when the report is run, is there?
Last edited: