# Excel Power Query: Reducing File Size by Deleting Tables?



## SuperNerd (Sep 22, 2020)

I have a fairly large file (276MB, 13+ Queries, lots of data). I'm trying to increase speed/efficiency. My data is broken down as far as it can go, with 90% of it as either a 1 or 0 in the fields. Excel is starting to crash and corrupt the file when I try to do anything- and I haven't even started doing the complex pivot tables yet!

Here's what I did:

1) loaded a large flat file from CSV into excel.
2) split that up into multiple sheets by grouping common data that made sense to be in a table/query together.
3) Did minor manipulations, copy/pasted as values only, deleted the original formulas as well as any unnecessary data/columns/tables... Turned each of those into separate tables
4) Data --> From Table
5) From Power Query- Load & Close as Connection Only and Saved to Data Model.

Result is 13 Queries, all loaded into the data model and connection confirmed. I then created relationships and saved.

Now, It's my understanding that at this point, since all my data is loaded into queries, as connections only and saved in the data model- that I can delete the original tables in the excel worksheet, and that would significantly reduce my file size/resources. If I need the data, I can just create a pivot table of it and get it all back- no reason to keep it all onto separate sheets.... right?

But when I go and delete out the original tables/sheets, the queries/connections break and I have no access to the data any longer...

What am I missing? What can I do to compress this down as much as I can?
Thank you!


----------



## sandy666 (Sep 22, 2020)

in short
don't load all into Data Model (Data Model increase file size almost twice)
join these queries what need to be joined (relationship)
you can create normal Pivot directly from Queries
and NO, you can't delete source csv
then load *only* the queries into the worksheet that you need


----------



## SuperNerd (Sep 22, 2020)

First off- thanks for the quick reply. This sort of begs the question then- what's the point of the data model, or when would I use it?

My end goal is going to require combining multiple pivot tables running calculations between them, and then doing a pivot on those results... so I'm certain that if it's bogging down this heavily NOW before I do any of that- it'll never make it thru the first pivot! LOL...


----------



## sandy666 (Sep 22, 2020)

SuperNerd said:


> This sort of begs the question then- what's the point of the data model, or when would I use it?


You'll need to find answer yourself, I don't use Data Model too often (only if I *really* need) and I am happy  
most of calculations you can do by joining queries then, if needed, join joined queries again
I don't know what are you doing there so I can't say what and how you can do that but for sure if you'll use proper and optimised M it will be faster than PQ and Power Pivot together
first what you need is PLANNING your work not Click&Go


----------



## sandy666 (Sep 22, 2020)

here you'll find all what you need for Power Query








						Power Query documentation - Power Query
					

Power Query is used to filter, combine, and mash-up data from one or more supported data sources.



					docs.microsoft.com


----------



## SuperNerd (Sep 23, 2020)

UPDATE:
- I exported each table to a csv. 
- Re-imported via Data--> From CSV/File into PQ
- Load/Closed and saved as Connection Only and Load to Data Model.

So there's no data on any sheets inside excel- just connections and data model loaded.
File size went from 299MB to only 99MB as of now.


----------



## sandy666 (Sep 23, 2020)

you are stubborn with Data Model so I can't help
have a nice day


----------



## SuperNerd (Sep 23, 2020)

haha- I'm just trying to learn as I go. Every video I watch, every blog/site I read says that in order to increase efficiency and reduce file size/RAM useage- you need to load into the data model. I have one table that has nearly 2mil rows, so I can't load that into a sheet if I wanted to (I'd have to break it up, or use the DM anyways).

I DO appreciate all your input and help- here, and on other threads. I see you help a TON in other threads I come across and read- so I give you full respect! You certainly know your stuff! So thank you!


----------



## sandy666 (Sep 23, 2020)

who said you MUST use Data Model for 2 million rows table?!
just load it into PQ that's all
or better
1. with new workbook - load it to PQ, save and check file size
2. with new workbook - load it into Data Model , save and check file size
with small data it's really doesn't matter but if you will use eg. 12 or more million rows and one hundred columns you'll see difference
if you think you must use Data Model (because someone said that) it's just a lack of knowledge about Power Query


----------



## SuperNerd (Sep 23, 2020)

ahhhh- I see. Loaded ONLY the connection, file size = 17kb. LOL. 

So when would you want/need the data model? I thought I needed it to create relationships between each data connection, as well as handle complex queries? For example- I have 12 different connections/csv files now, and each have the same column of numbers (I called it 'index'- just so every row can link to the other tables/rows/dates easily). Don't you need data modeling to handle this faster/more efficiently?

ie- the end result will be a query that combines multiple pivot tables, then queries those results 1.5mil x 1.5mil combinations... which is MASSIVE. Not sure Excel can handle that without data model?
Thank you!


----------



## SuperNerd (Sep 22, 2020)

I have a fairly large file (276MB, 13+ Queries, lots of data). I'm trying to increase speed/efficiency. My data is broken down as far as it can go, with 90% of it as either a 1 or 0 in the fields. Excel is starting to crash and corrupt the file when I try to do anything- and I haven't even started doing the complex pivot tables yet!

Here's what I did:

1) loaded a large flat file from CSV into excel.
2) split that up into multiple sheets by grouping common data that made sense to be in a table/query together.
3) Did minor manipulations, copy/pasted as values only, deleted the original formulas as well as any unnecessary data/columns/tables... Turned each of those into separate tables
4) Data --> From Table
5) From Power Query- Load & Close as Connection Only and Saved to Data Model.

Result is 13 Queries, all loaded into the data model and connection confirmed. I then created relationships and saved.

Now, It's my understanding that at this point, since all my data is loaded into queries, as connections only and saved in the data model- that I can delete the original tables in the excel worksheet, and that would significantly reduce my file size/resources. If I need the data, I can just create a pivot table of it and get it all back- no reason to keep it all onto separate sheets.... right?

But when I go and delete out the original tables/sheets, the queries/connections break and I have no access to the data any longer...

What am I missing? What can I do to compress this down as much as I can?
Thank you!


----------



## SuperNerd (Sep 23, 2020)

To add to this-
The data model loads the data into memory, where as using the connection only excel would need to access each csv file for every query. For complex, large queries, the I/O and system resources could drastically impact efficiency, couldn't it?


----------



## sandy666 (Sep 23, 2020)

SuperNerd said:


> So when would you want/need the data model?


Practice



SuperNerd said:


> The data model loads the data into memory, where as using the connection only excel would need to access each csv file for every query. For complex, large queries, the I/O and system resources could drastically impact efficiency, couldn't it?


it depends on your code (M)
or how your model will be organized

one additional point: Data Model is LIMITED to 2 GB only


----------

