Aggregation performance

Koxeida · Jan 11, 2021

Hi all,

I have multiple csv files that adds up to about 30gb of data and I'm building a data model on Power BI. Basically appended all the fact files and cleaned up the dimension tables on power query and the loading time is alright. The end result is that I have a BI file that is around 2gb in size which is extremely huge to share around.

So I consider splitting up the data into multiple aggregated tables but I'm encountering a big performance issue when it comes to aggregating the data itself. I've tried using "Group by" function on Power Query to consider only the higher level of hierarchy from different dimension tables, but the data query would not seem to stop (after letting it run for a couple of hours).

Just for more context
My fact table contains 3 primary keys (1 geographic key, 1 product key and 1 time key) with 3 value columns.
In an attempt to aggregate product data at a higher hierarchy, I'm merging the fact table with product dimension table and expanded for 3 new columns.
Using Group by, I've selected the 2 other keys and 3 new columns and do "sum" for the 3 value columns.
The Query refresh performance is extremely slow (it seems to re-run the same dataset multiple times)

Is there another way to aggregate data? My goal is to have multiple aggregated data so that the BI files are kept at reasonable sizes

Aggregation performance

Koxeida

Board Regular

Similar threads

Forum statistics

Share this page

Aggregation performance

Koxeida

Board Regular

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock