How do I append a number of workbooks in a folder where the column names are different and the data is not in tables

martingaleh

Board Regular
Joined
Jul 18, 2011
Messages
83
There are a number of youtube videos where you can append the data of many workbooks together if they're in tables or the column names are the same or the column positions are the same etc. Here are the challenges I've faced:
1. the Worksheets are all different names, however there is only one worksheet in the workbook --solved
2. the data is in a worksheet, not a table
3. there are any number of columns.

My solution so far includes using this Transform Process From Binary function, which I amended:

Power Query:
= (#"Sample File Parameter1" as binary) => let
        Source = Excel.Workbook(#"Sample File Parameter1", null, true),
        Sheet1_Sheet = Source{[Item=Source{0}[Item],Kind="Sheet"]}[Data],  <--select first sheet
        #"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true])
    in
        #"Promoted Headers"
Then in the aggregating sheet I have this:
Power Query:
let
    Source = Folder.Files("C:\Users\kobe.bryant\Downloads\June\process"),
    #"Added Custom" = Table.AddColumn(Source, "Tables", each Excel.Workbook([Content])),
    #"Filtered Hidden Files1" = Table.SelectRows(#"Added Custom", each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from process", each #"Transform File from process"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from process"}),
       #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from process", Table.ColumnNames(#"Transform File from process"(#"Sample File")))
in
    #"Expanded Table Column1"

There are a couple of issues here. the Transform Process From Binary function is supposed to promote headers from all of the sheets, but it doesn't, it only promotes headers from the first sheet and keeps "Column1" etc as the names of all the other sheets if they don't coincide with a name that already exists on the sample sheet.

Then when we use Table.ExpandTableColumn the the aggregating sheet:
Table.ColumnNames(#"Transform File from process"(#"Sample File")) I only get the names of the column in that sample file. If other downstream files don't have these column names and instead has different columns with different names, I'll lose those columns.


What I want is:

1623960777259.png
 

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
What you should probably do is import each type of data source separately and do the required transformations to make them identical, then append those modified queries.

If I enter the first two tables separately and then append as new query, i.e. (let Source = Table.Combine({#"Table 1", #"Table 2"} in Source)), I get exactly the result of your last screenshot.

To answer your questions, #2 shouldn't be that big of an issue, but you'll need to do extra work to promote headers, etc etc as required to get the data how you need. #3 might be an issue depending on the sheer number and variability of the columns you're dealing with. Is there any way you can better standardize your inputs?
 
Upvote 0
There is a complicated solution that requires importing twice. The first import pulls all the worksheets in with headers column1 etc. on down. Then I have to make a unique list of all the values in the second row (the actual row header). Then I do the normal import worksheets using the code I have above. Then I change the line below:
Power Query:
       #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from process", Table.ColumnNames(#"Transform File from process"(#"Sample File")))

To
Power Query:
       #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from process", Listfromabove)
where Listfromabove is the list I formulated on the first import. Too bad there isn't anything better. A VBA solution seems like a better solution
 
Upvote 0

Forum statistics

Threads
1,223,677
Messages
6,173,791
Members
452,534
Latest member
autodiscreet

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top