Advantages of denormalising in Power Pivot???

masplin · Oct 20, 2015

I have been given a large data set of 8mio rows that lists of transaction lines for a group of shops. Its a single sql query containing all the data and i am wondering if i am better of splitting it into 2 tables that are related. Depends how PP deals with the data and which is more efficient.

The current single table has the following sort of columns. Some of the columns vary with the lines of the transaction and some are repeated for every line of that transaction. A single transaction ID (invoice) can have up to 20 lines.

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]ProdID[/TD]
[TD]Qty[/TD]
[TD]Price[/TD]
[TD]Discount[/TD]
[TD]TaX[/TD]
[TD]Location[/TD]
[TD]StaffID[/TD]
[TD]TillID[/TD]
[TD]Date[/TD]
[TD]Time[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]55[/TD]
[TD]1[/TD]
[TD]5.50[/TD]
[TD]0.50[/TD]
[TD]0.37[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]79[/TD]
[TD]2[/TD]
[TD]3.50[/TD]
[TD]0.35[/TD]
[TD]0.25[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
</tbody>[/TABLE]

ProdID, StaffID,TillID are all related to other tables

This could be split into 2 table related by the TransID

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]ProdID[/TD]
[TD]Qty[/TD]
[TD]Price[/TD]
[TD]Discount[/TD]
[TD]TaX[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]55[/TD]
[TD]1[/TD]
[TD]5.50[/TD]
[TD]0.50[/TD]
[TD]0.37[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]79[/TD]
[TD]2[/TD]
[TD]3.50[/TD]
[TD]0.35[/TD]
[TD]0.25[/TD]
[/TR]
</tbody>[/TABLE]

and

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]Location[/TD]
[TD]StaffID[/TD]
[TD]TillID[/TD]
[TD]Date[/TD]
[TD]Time[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
</tbody>[/TABLE]

Currently there are 8 million rows of lets say 12 column, 6 variable and 6 fixed per invoice. So this would become 8mio transaction rows of 6 column and about 1 mio invoice rows of 6 columns.

so my question is whether this is the right thing to do as seems to create less data to be stored as avoids all the duplication. However i get the impression PP may store data in a way that this isn't actually a problem. if anyone can advise as to how far it is advantageous to denormalise for clarity vs inefficiency due to duplication I'd be grateful

Thanks
Mike

scottsen · Oct 21, 2015

This 1 is super hard to answer. You are probably going to end up wanting to test this one. I have done it both ways (separate invoice detail/header, and single invoice table)... just depending on perf testing and the distributions of my data.

My gut though? Leave it as 1 table. Power Pivot tends to prefer snowflake, and big "double hop lookups" can cause perf issues (eg, detail -> header -> calendar).

And while generally I get excited about removing columns (more columns is generall worse for perf than more rows)... your "extra columns" are likely going to compress very well, since there aren't that many { staff, tills, dates, times, locations }. Few distinct values = good compression.

My gut? Leave it as 1 table. I find that power pivot generally performs better in a star schema. I think

masplin · Oct 21, 2015

That's what I was wondering, but had also read that multiple columns, particularly as I will have double the calculated columns, can be even worse. Of course very difficult to actually test as you have to build up both models in duplicate to see when one starts to slow down!!! Since I've started with a big table I'll stick with that since doesn't seem clear there is an advantage

Thanks a lot

Advantages of denormalising in Power Pivot???

masplin

Active Member

Excel Facts

scottsen

Well-known Member

masplin

Active Member

Similar threads

Forum statistics

Share this page

Advantages of denormalising in Power Pivot???

masplin

Active Member

Excel Facts

scottsen

Well-known Member

masplin

Active Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock