Best practices on setting up a model

martingaleh · Sep 26, 2017

I have a few general questions on setting up a model

I normally believe in modularity in programming. So I'd make a module for the connection, one for the table, and then I'd aggregate all the tables into one big table for pivoting purposes. However, I've learned in power query, if you do it that way, you'd pull the data once for each query, thus pulling it three or for times before the modelling even begins. Is this possibly true? Do people aggregate all of their staging and etl steps into one big script?

Further, I usually make one date dimension table for my custom calendar. In other bi tools, you can connect that same dimension multiple times, each one with a different alias, so for instance I'd have invoice date and due date keys in the fact table connected to the date dimension twice with different aliases. You can do this in access too, for example. Is that how people do it in power pivot? Is it better to just join the date dimensions right into the fact table twice, once for due date and once for invoice date?
Page 61 of the Analyzing data with microsoft power bi suggests activating different relationships when necessary. This doesn't work when you need to make a waterfall (i.e. a table with invoice data on the left and due date on top that shows when you get money for the invoices you issue)

Finally, I had been making my models in different excel books like one for ar, one for the forecast and then when I need to compare them, I'd make yet another book that connected to those books and brought the data down. I've learned you can't connect to the data model directly, you have to materialize the denormalized table on a worksheet (which is amazing to me) and then connect. Then doing so mean if I have a model like this:
Big model.xls
--->staging table 1---> staging table 2 in ar.xls
--->ar company 1.xls
union
--->ar company 2.xls
--->staging table 1---> staging table 2 in cash.xls
--->cash company1.xls
union
--->cash company2.xls

That I would have to hand update every single xls going up the chain to get my big model up to date. Worse, if I miss one update, I'll never know. There must be a more intelligent setup where the different stages of the data can be contained in the different workbooks and then everythign updates when the report is run, is there?

gazpage · Sep 27, 2017

Few things here.

Power Query in excel is pretty inefficient for queries that have overlap. It runs each one separately without regard to ones that have run before. It does use query folding, but that doesn’t always help. Don’t have any suggestions unless you can move to Power BI or SSAS.

You can have multiple relationships between tables, but only one can be active. To use the inactive relationships you need to use USERELATIONSHIP().

Here is a blog where he looks at aged debtors which might help. He uses two days tables but notes that userlationship could be used.

https://powerpivotpro.com/2012/04/aged-debtorsdynamic-banding-in-dax/

You really don’t want cascading excel book like that. Go into the query in your old book, go to Advanced Editor and copy the query. Paste this into a new blank Query in the new book. Set it to Load To... Connection Only and Load to data Model off. You can the use the resulting table in a new query without pulling the intermediate data into the excel.

martingaleh · Sep 29, 2017

gazpage said:
Few things here.
You really don’t want cascading excel book like that. Go into the query in your old book, go to Advanced Editor and copy the query. Paste this into a new blank Query in the new book. Set it to Load To... Connection Only and Load to data Model off. You can the use the resulting table in a new query without pulling the intermediate data into the excel.

Yes, I see how you don't want cascading excel books because then you'd have to hand refresh each book. At the same time, you have to stay dry, which is the purpose for the modularity in the first place. I figured out the solution! Talend + Postgre. Power Query is really easy, and reasonably powerful. The exact opposite of dax. It's probably better at manipulating really complex excel better than anything except xlwings python. However, it can't be used in a professional environment if you're supposed to sprawl your code out to a number of workbooks. There used to be an issue getting the schema out of postgre with the 64 bit driver, but I think that's fixed now. The wonkiness with the -1 for true false is still there, but 95% of the time it's as capable as power query. It's not as easy to use though. Power query is easier to use than ssdt, talend, pentaho, cognos, oracle warehouse maker... If only it can feed a real warehouse.

Best practices on setting up a model

martingaleh

Board Regular

gazpage

Active Member

martingaleh

Board Regular

Similar threads

Share this page

Best practices on setting up a model

martingaleh

Board Regular

gazpage

Active Member

martingaleh

Board Regular

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock