Combining multiple GB of Excel-data from various folders with similar structure

Toalle

New Member
Joined
Sep 28, 2021
Messages
1
Office Version
  1. 365
Platform
  1. Windows
Hello all! :)

I am trying to combine hundreds of Excel-files, and each one of them has approximately 100 columns and 10k-100k rows. So the total amount of rows is probably tens of millions.

The folder/file structure is like this (download, password: mrexcel):

So basically there is a folder for every country, and under every country there is a subfolder for a project. That subfolder then contains standardized folder structure, and further in the folder structure there is a folder called 'Data'. That data-folder contains Excel files which all have identical formatting, and the data is presented in a table which always has the same name, in this case, xdata.

From the data files, I do not need all the columns, only about 10 or so. Same applies to every data file. I would like to place filters in Power BI reporting view, which allows me to select a country and a project, and BI would then load let's say for example 3 latest data files from the folder in question, matching my filter selections.

Any tips on how to approach this without choking BI from all that data? Would be greatly appreciated :) Thank you!
 

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.
I don't know that you'd choke PBI with that much. PBI can handle 2 billion or so rows. Sure, it'd be better if it were all stored in a database already, but you work with what you've got.

I've done similar transformations, albeit maybe not quite that large. If you're particularly concerned with the size of the dataset, create a set for each country or groups of countries and duplicate the report on different pages for each set.

Otherwise, before combining the files in PBI, filter the folder location to include "Data" and the file names to include "xdata." After PQ creates the helper queries to combine them, you may need to go back to a remove other columns step and make sure country identifying columns are retained to be used as filter criteria later (unless that is included in the files themselves).

It will take a while to process.

Unfortunately, I don't know of any way to not Select * all the columns out of excel files when loading them up, which is why it'd be better if they were in a db to be pulled out with SQL.

Hope that helps. Good luck.
 
Upvote 0

Forum statistics

Threads
1,223,697
Messages
6,173,895
Members
452,536
Latest member
Chiz511

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top