Power Query slows down after merging queries or tables

bobby_smith

Board Regular
Joined
Apr 16, 2014
Messages
90
Hi All,

I'm experiencing a problem where power query slowed down after I merged two tables. Since the merge, each time I do a step, it appears its recalculating or refreshing something in the back ground. the refresh stops when it reaches 120MB which takes about 5 to 10 minutes. So basically each time I add a new step I have to wait 5 to 10 minutes before I can move forward.

One table has 700,000 rows X 30 columns and the other table had 150,000 x 30.

It took me almost an entire day to get 3/4 through what is needed for the end result.

The file was working fine up until I did the merge.

I've come across solutions suggesting that creating a primary key may be helpful but I've not seen any instructions on how to do this. I have a unique ID field that I can assign as primary key if needed.

I would really like some assistance with this as it is taking me forever to finish my query.

Thanks.
 
Just to make sure I understand as the file is really slow.

For prior year

Prior Year Query code


Code:
let
Source = Folder.Files("C:\Users\..........................."),
Key = Table.AddKey(#"Added Custom", {"Location"}, true),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Prior Year", each #"Transform File from Prior Year"([Content])),

For Current year



Code:
let
Source = Folder.Files("C:\Users\................"),
Key = Table.AddKey(#"Delete some disposed items", {"ommon System Number"}, true),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Current Year", each #"Transform File from Current Year"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),

Exactly like this followed by the remaining coding?
Also, what's the logic on selecting which field to be the Key? The Common System number is a unique value in both queries. It seems I'm using the system number in one and the location in another.

I'm just trying to get a better understanding of your thought process.

Also I'm not understanding this part "if it doesn't work faster try Key for: #"Merged Queries1"" where would I put this?

Thanks
 
Last edited:
Upvote 0

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
first refresh thread and re-read first line in post#10

Code:
let
 Key = Table.AddKey(#"Added Custom", {"Location"}, true),
 Source = Folder.Files("C:\Users\..........................."),
 #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),

Code:
let
 Key = Table.AddKey(#"Delete some disposed items", {"ommon System Number"}, true),
 Source = Folder.Files("C:\Users\................"),
 #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),

Also I'm not understanding this part "if it doesn't work faster try Key for: #"Merged Queries1"" where would I put this?

in Prior you've two merges so if first doesn't work well try Key for the next merge:

replace with previous key:
Code:
 Key = Table.AddKey(#"Expanded Location_Table", {"Common System Number"}, true),

edit:
you need to test time (4 possibilities) how it works with single key in one query, one key for each query
btw. your code is a mish-mash to me, not optimized, sorry :)
 
Last edited:
Upvote 0
Thanks. I"ll try this. I did not create the code by writing/coding. All I did mostly was using the point and click feature. I'm still learning power query and I'm not advance enough to be writing full code in the M language.

I'm truly thankful for your assistance thus far.
 
Upvote 0
You are welcome

also you can try

Code:
in
 Table.Buffer(#"Merged Queries1")

Code:
in
 Table.Buffer(#"Expanded Location_Table")

but sometimes it makes query slower than faster so you need to test it.

btw.
on 1 000 000 rows Table.Buffer changed refresh time from 15 secs to 2.5 minutes :diablo: so be careful
 
Last edited:
Upvote 0
The key appears to work, but I'll not fully know until tomorrow.

Can you help me understand the primary key please. When I researched it, the syntax was
Code:
Table.AddKey(table as table,  columns as list,  isPrimary as logical) as table

The table is the name of table and column is the column with the key.

The code you gave me (ex the current year) appears to use table name as #"Delete some disposed items". Can you create any table name and use it?
Also, when you created the key for the Prior year, you used the table "#Added Custom" and then you used location as the column with the primary key.
Whats the logic on selecting which column to be the primary key? Should that column contain unique values?

Lastly, could I have use the column "Common System Number" as the primary key for both queries?

Thank you for your patience in responding to my questions as I'm truly trying to understand what is being done so I can get better at power query.

Thanks
 
Upvote 0
Should that column contain unique values?
this is the best situation, merge reading each row so if there is more duplicates it will take more time
also you can use Remove Duplicates from this column, instead of the Key but from practice on my files Key+RemoveDuplicates works faster
(you need to know what are you doing :) )

could I have use the column "Common System Number" as the primary key for both queries?
as I said you've 4 possibilities, this is just 4th option :)

Prior
Code:
Key = Table.AddKey(#"Expanded Location_Table", {"Common System Number"}, true),
Current
Code:
Key = Table.AddKey(#"Delete some disposed items", {"Common System Number"}, true),

there is no any Golden Advice, you must test it yourself.

edit:
I forgot to add simple example:

Code:
[SIZE=1]// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Type = Table.TransformColumnTypes(Source,{{"City", type text}, {"Name", type text}, {"Date", type date}})
in
    Type[/SIZE]

Code:
[SIZE=1]// Table2
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    Type = Table.TransformColumnTypes(Source,{{"City", type text}, {"Name", type text}, {"Date", type date}})
in
    Type[/SIZE]

Code:
[SIZE=1]// Merge1
let
    Key = Table.AddKey(Table1, {"City"}, true),
    Source = Table.NestedJoin(Table1,{"City"},Table2,{"City"},"Table2",JoinKind.Inner),
    Expand = Table.ExpandTableColumn(Source, "Table2", {"City", "Name", "Date"}, {"Table2.City", "Table2.Name", "Table2.Date"})
in
    Expand[/SIZE]
 
Last edited:
Upvote 0
also you can try InnerJoin instead of LeftJoin

but as I said: test it yourself
 
Upvote 0
Can you help me understand the primary key please. When I researched it, the syntax was
Code:
Table.AddKey(table as table,  columns as list,  isPrimary as logical) as table

Whats the logic on selecting which column to be the primary key? Should that column contain unique values?

[Table="width:50%, align:center, class:head"]
[tr=bgcolor:#FFFFFF][td]
Definition - What does Primary Key mean?

A primary key is a special relational database table column (or combination of columns) designated to uniquely identify all table records.

A primary key’s main features are:
  • It must contain a unique value for each row of data.
  • It cannot contain null values.
A primary key is either an existing table column or a column that is specifically generated by the database according to a defined sequence.

The primary key concept is critical to an efficient relational database. Without the primary key and closely related foreign key concepts, relational databases would not work.

Almost all individuals deal with primary keys frequently but unknowingly in everyday life. For example, students are routinely assigned unique identification (ID) numbers, and all U.S. citizens have government-assigned and uniquely identifiable Social Security numbers.

For example, a database must hold all of the data stored by a commercial bank. Two of the database tables include the CUSTOMER_MASTER, which stores basic and static customer data (name, date of birth, address, Social Security number, etc.) and the ACCOUNTS_MASTER, which stores various bank account data (account creation date, account type, withdrawal limits or corresponding account information, etc.).

To uniquely identify customers, a column or combination of columns is selected to guarantee that two customers never have the same unique value. Thus, certain columns are immediately eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is designated to hold Social Security numbers. However, some account holders may not have Social Security numbers, so this column’s candidacy is eliminated. The next logical option is to use a combination of columns, such as adding the surname to the date of birth to the email address, resulting in a long and cumbersome primary key.

The best option is to create a separate primary key in a new column named CUSTOMER_ID. Then, the database automatically generates a unique number each time a customer is added, guaranteeing unique identification. As this key is created, the column is designated as the primary key within the SQL script that creates the table, and all null values are automatically rejected.

The account number associated with each CUSTOMER_ID allows for the secure handling of customer queries and also demonstrates why primary keys offer the fastest method of data searching within tables. For example, a customer may be asked to provide his surname when conducting a bank query. A common surname (such as Smith) query is likely to return multiple results. When querying data, utilizing the primary key uniqueness feature guarantees one result.


[/td][/tr]
[/table]
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,324
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top