Advantages of denormalising in Power Pivot???

masplin

Active Member
Joined
May 10, 2010
Messages
413
I have been given a large data set of 8mio rows that lists of transaction lines for a group of shops. Its a single sql query containing all the data and i am wondering if i am better of splitting it into 2 tables that are related. Depends how PP deals with the data and which is more efficient.

The current single table has the following sort of columns. Some of the columns vary with the lines of the transaction and some are repeated for every line of that transaction. A single transaction ID (invoice) can have up to 20 lines.

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]ProdID[/TD]
[TD]Qty[/TD]
[TD]Price[/TD]
[TD]Discount[/TD]
[TD]TaX[/TD]
[TD]Location[/TD]
[TD]StaffID[/TD]
[TD]TillID[/TD]
[TD]Date[/TD]
[TD]Time[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]55[/TD]
[TD]1[/TD]
[TD]5.50[/TD]
[TD]0.50[/TD]
[TD]0.37[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]79[/TD]
[TD]2[/TD]
[TD]3.50[/TD]
[TD]0.35[/TD]
[TD]0.25[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
</tbody>[/TABLE]

ProdID, StaffID,TillID are all related to other tables

This could be split into 2 table related by the TransID

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]ProdID[/TD]
[TD]Qty[/TD]
[TD]Price[/TD]
[TD]Discount[/TD]
[TD]TaX[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]55[/TD]
[TD]1[/TD]
[TD]5.50[/TD]
[TD]0.50[/TD]
[TD]0.37[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]79[/TD]
[TD]2[/TD]
[TD]3.50[/TD]
[TD]0.35[/TD]
[TD]0.25[/TD]
[/TR]
</tbody>[/TABLE]

and

[TABLE="width: 500"]
<tbody>[TR]
[TD]TransID[/TD]
[TD]Location[/TD]
[TD]StaffID[/TD]
[TD]TillID[/TD]
[TD]Date[/TD]
[TD]Time[/TD]
[/TR]
[TR]
[TD]1234[/TD]
[TD]London[/TD]
[TD]123[/TD]
[TD]456[/TD]
[TD]21/9/15[/TD]
[TD]13:45[/TD]
[/TR]
</tbody>[/TABLE]



Currently there are 8 million rows of lets say 12 column, 6 variable and 6 fixed per invoice. So this would become 8mio transaction rows of 6 column and about 1 mio invoice rows of 6 columns.

so my question is whether this is the right thing to do as seems to create less data to be stored as avoids all the duplication. However i get the impression PP may store data in a way that this isn't actually a problem. if anyone can advise as to how far it is advantageous to denormalise for clarity vs inefficiency due to duplication I'd be grateful

Thanks
Mike
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
This 1 is super hard to answer. You are probably going to end up wanting to test this one. I have done it both ways (separate invoice detail/header, and single invoice table)... just depending on perf testing and the distributions of my data.

My gut though? Leave it as 1 table. Power Pivot tends to prefer snowflake, and big "double hop lookups" can cause perf issues (eg, detail -> header -> calendar).

And while generally I get excited about removing columns (more columns is generall worse for perf than more rows)... your "extra columns" are likely going to compress very well, since there aren't that many { staff, tills, dates, times, locations }. Few distinct values = good compression.


My gut? Leave it as 1 table. I find that power pivot generally performs better in a star schema. I think
 
Upvote 0
That's what I was wondering, but had also read that multiple columns, particularly as I will have double the calculated columns, can be even worse. Of course very difficult to actually test as you have to build up both models in duplicate to see when one starts to slow down!!! Since I've started with a big table I'll stick with that since doesn't seem clear there is an advantage

Thanks a lot
 
Upvote 0

Similar threads

Forum statistics

Threads
1,224,126
Messages
6,176,519
Members
452,733
Latest member
Gao87

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top