bobsan42
Well-known Member
- Joined
- Jul 14, 2010
- Messages
- 2,114
- Office Version
- 365
- 2019
- 2016
- 2013
- Platform
- Windows
Dear All,
Especially the ones more familiar with M code, databases and data analysis. I admit that PowerQuery and M code seem to be quite a versatile tool, but I am puzzled by its low performance. So I wonder - is it me doing something wrong or it's just the way it is. Is there anything I can do about it.
Some time ago I built an excel file to enter data and then show some data extracts & summaries using PowerQuery tables and alter the data when necessary. The basic purpose is keeping track of invoices, payments and accounting. I used VBA, UserForms and M code. Everything is in one file, icluding the data. The file itself is not too large (10 sheets and about 1 MB per 1000 records)
So all is working and no major bugs for more than half an year now. However since the very beginning I am unhappy with the file performance when it comes to refreshing the PowerQuery tables and actually present the necessary data to the user. Because some VBA procedures are run on the output (formatting and arranging data) after the data is updated I do not run the queries in the background (as a side effect this also allows me to easily measure how long a refresh takes).
Since the first production built of the file I managed to learn a bit more about M code, so naturally I tried to shorten, simplify and optimise my queries. Some improvement, but nothing major. Then I changed the source of some queries (where possible) - used the output table from a previous query instead of going through the full data. So this gained some time, but still far from fast and smooth. Used from a shared folder the amount of time a refresh took becomes close to annoying. So despite my efforts it kept on feeling clumsy and slow.
Eventually, last week I decided to try a combination of ACCDB as backend, containing all the data, and the front-end containg the interface (forms, CustomUI, VBA code) and the output tables.
Using ADODB and SQL made the process times faster.
Using a pesistent connection to the ACCDB made the overall process of updating and formatting the data at least 10 times faster then PowerQuery. A single table updates through ADODB and SQL for approx 6 ms, while the same takes more than 300 ms for PowerQuery.
The other slightly annoying bit is that the PQ refresh time is not too consistent each time.
So when you need to update several tables per sheet it makes quite a difference. Not to mention the initial refresh on file opening which may take 15 or more seconds.
So any advices are welcome.
Is there anything I can do, ot it is what it is - sacrificing a bit of speed for the sake of a serious set of functionalities? I found out PQ can do great many unexpected and nice things (e.g. extracting data from PDFs and making sense of it), but a large non native data set (e.g. a 500 pages PDF ) totally killed it.
Especially the ones more familiar with M code, databases and data analysis. I admit that PowerQuery and M code seem to be quite a versatile tool, but I am puzzled by its low performance. So I wonder - is it me doing something wrong or it's just the way it is. Is there anything I can do about it.
Some time ago I built an excel file to enter data and then show some data extracts & summaries using PowerQuery tables and alter the data when necessary. The basic purpose is keeping track of invoices, payments and accounting. I used VBA, UserForms and M code. Everything is in one file, icluding the data. The file itself is not too large (10 sheets and about 1 MB per 1000 records)
So all is working and no major bugs for more than half an year now. However since the very beginning I am unhappy with the file performance when it comes to refreshing the PowerQuery tables and actually present the necessary data to the user. Because some VBA procedures are run on the output (formatting and arranging data) after the data is updated I do not run the queries in the background (as a side effect this also allows me to easily measure how long a refresh takes).
Since the first production built of the file I managed to learn a bit more about M code, so naturally I tried to shorten, simplify and optimise my queries. Some improvement, but nothing major. Then I changed the source of some queries (where possible) - used the output table from a previous query instead of going through the full data. So this gained some time, but still far from fast and smooth. Used from a shared folder the amount of time a refresh took becomes close to annoying. So despite my efforts it kept on feeling clumsy and slow.
Eventually, last week I decided to try a combination of ACCDB as backend, containing all the data, and the front-end containg the interface (forms, CustomUI, VBA code) and the output tables.
Using ADODB and SQL made the process times faster.
Using a pesistent connection to the ACCDB made the overall process of updating and formatting the data at least 10 times faster then PowerQuery. A single table updates through ADODB and SQL for approx 6 ms, while the same takes more than 300 ms for PowerQuery.
The other slightly annoying bit is that the PQ refresh time is not too consistent each time.
So when you need to update several tables per sheet it makes quite a difference. Not to mention the initial refresh on file opening which may take 15 or more seconds.
So any advices are welcome.
Is there anything I can do, ot it is what it is - sacrificing a bit of speed for the sake of a serious set of functionalities? I found out PQ can do great many unexpected and nice things (e.g. extracting data from PDFs and making sense of it), but a large non native data set (e.g. a 500 pages PDF ) totally killed it.