Merging multiple Excel files in a folder into columns, not rows

SteynBS

Board Regular
Joined
Jun 27, 2022
Messages
111
Office Version
  1. 365
Platform
  1. Windows
Hi Guys,

I am using Power Query to merge multiple Excel files in 1 folder, but instead of throwing the data below the other in rows, I need it in columns.

So I have a master file that I will use as a seperate source called Master. The layout looks like this
1686072229511.png


I will then have multiple other excel files in a folder they will have different names, but they will all have the same layout:

1686072304778.png


What I need th achieve is all the files in the folder needs to merge with the Master file using the "item" as the join. The final product will look like this:

1686072526511.png


Currently what I am doing is using each file as a source and merging it to the Master file using the "item' as the join. Is there anyway I can achieve this in a faster way?
 

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
I don't know how faster this could be, but I know you will only need to refresh the table when you add/edit files (as long as the structures of the files are the same).

The following code requires:
  • Sheet1 in all workbook files (Find and replace the string "Sheet1" in two places in the code if it is a different sheet name).
  • Item column in the master data file. Other columns don't matter.
  • Item, Region, and Sales columns in the other data files.
  • Master.xlxs files is not in the same folder with the other files (this could be handled if it has to be).
File and folder paths should be changed as needed in the code below.

Power Query:
let
    Master = Table.PromoteHeaders(
                Excel.Workbook(
                    File.Contents("C:\SomeFolder\Master.xlsx"), null, true
                ){[Item="Sheet1",Kind="Sheet"]}[Data]
            ),

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data", 
                    each Excel.Workbook([Content], true, true){[Item="Sheet1",Kind="Sheet"]}[Data]
    ),
    Tables = Table.SelectColumns(Files,{"Data"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0 
                then current 
                else
                    let 
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Result = Table.ExpandTableColumn(Nest, FileName, {"Sales", "Region"}, {"Sales-" & Text.From(state[i]), "Region-" & Text.From(state[i])})
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result

Result:
ItemDescriptionDepartmentSales-1Region-1Sales-2Region-2Sales-3Region-3
ZzzZzz1Zzz2YYYZZZDDD
AaaAaa1Aaa2EEEAAAEEE
BbbBbb1Bbb2FFFBBBCCC
 
Upvote 0
I don't know how faster this could be, but I know you will only need to refresh the table when you add/edit files (as long as the structures of the files are the same).

The following code requires:
  • Sheet1 in all workbook files (Find and replace the string "Sheet1" in two places in the code if it is a different sheet name).
  • Item column in the master data file. Other columns don't matter.
  • Item, Region, and Sales columns in the other data files.
  • Master.xlxs files is not in the same folder with the other files (this could be handled if it has to be).
File and folder paths should be changed as needed in the code below.

Power Query:
let
    Master = Table.PromoteHeaders(
                Excel.Workbook(
                    File.Contents("C:\SomeFolder\Master.xlsx"), null, true
                ){[Item="Sheet1",Kind="Sheet"]}[Data]
            ),

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data",
                    each Excel.Workbook([Content], true, true){[Item="Sheet1",Kind="Sheet"]}[Data]
    ),
    Tables = Table.SelectColumns(Files,{"Data"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0
                then current
                else
                    let
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Result = Table.ExpandTableColumn(Nest, FileName, {"Sales", "Region"}, {"Sales-" & Text.From(state[i]), "Region-" & Text.From(state[i])})
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result

Result:
ItemDescriptionDepartmentSales-1Region-1Sales-2Region-2Sales-3Region-3
ZzzZzz1Zzz2YYYZZZDDD
AaaAaa1Aaa2EEEAAAEEE
BbbBbb1Bbb2FFFBBBCCC
Good day,

Sir, you are a genius, thank you. It works perfectly.

Only one more question, please. So the source files headers "sales" and "region" is there any way this code can use the header allocated to each file in the source folder? e.g

One file will have "Sales Store 1" and "Region Store 1" and the other file will have "Sales Store 2" and Region Store 2" and so on. Basically, make the header names dynamic. The Master file headers will always be the same though.
 
Upvote 0
Only one more question, please. So the source files headers "sales" and "region" is there any way this code can use the header allocated to each file in the source folder? e.g
Yes. The revised code below. Note that Item column is still required in this version.

Power Query:
let
    Master = Table.PromoteHeaders(
                Excel.Workbook(
                    File.Contents("C:\SomeFolder\Master.xlsx"), null, true
                ){[Item="Sheet1",Kind="Sheet"]}[Data]
            ),

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data",
                    each Excel.Workbook([Content], true, true){[Item="Sheet1",Kind="Sheet"]}[Data]
    ),
    Tables = Table.SelectColumns(Files,{"Data"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0
                then current
                else
                    let
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        Result = Table.ExpandTableColumn(Nest, FileName, List.LastN(Columns, List.Count(Columns) - 1))
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result

You'll notice that I modified the let expression inside the List.Accumulator() function. The way I changed it is assuming all column names are unique. However, if for some reason, there are duplicate names then the new query will fail. In that case, the following could be also used. Basically, we create OldColumns and NewColumns lists dynamically. The NewColumns list is generated by adding the unique suffix to the OldColumns list values.


Power Query:
                    let
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        OldColumns = List.LastN(Columns, List.Count(Columns) - 1),
                        NewColumns = List.Transform(OldColumns, each _ & "-" & FileName),
                        Result = Table.ExpandTableColumn(Nest, FileName, OldColumns, NewColumns)
                    in
                        Result,

Note: The new versions work with any number of columns after the Item column in the data files.
 
Upvote 0
Yes. The revised code below. Note that Item column is still required in this version.

Power Query:
let
    Master = Table.PromoteHeaders(
                Excel.Workbook(
                    File.Contents("C:\SomeFolder\Master.xlsx"), null, true
                ){[Item="Sheet1",Kind="Sheet"]}[Data]
            ),

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data",
                    each Excel.Workbook([Content], true, true){[Item="Sheet1",Kind="Sheet"]}[Data]
    ),
    Tables = Table.SelectColumns(Files,{"Data"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0
                then current
                else
                    let
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        Result = Table.ExpandTableColumn(Nest, FileName, List.LastN(Columns, List.Count(Columns) - 1))
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result

You'll notice that I modified the let expression inside the List.Accumulator() function. The way I changed it is assuming all column names are unique. However, if for some reason, there are duplicate names then the new query will fail. In that case, the following could be also used. Basically, we create OldColumns and NewColumns lists dynamically. The NewColumns list is generated by adding the unique suffix to the OldColumns list values.


Power Query:
                    let
                        FileName = "File" & Text.From(state[i]),
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        OldColumns = List.LastN(Columns, List.Count(Columns) - 1),
                        NewColumns = List.Transform(OldColumns, each _ & "-" & FileName),
                        Result = Table.ExpandTableColumn(Nest, FileName, OldColumns, NewColumns)
                    in
                        Result,

Note: The new versions work with any number of columns after the Item column in the data files.
Thank you, this is new territory for me so I am making this up as I go along. Some issues I hope you can help me with
I amended the formula a bit.

1) The master file headers aren't in the 1st row. it's in the second. So I added a step to promote the 1st row to header, don't know if this can be resolved by removing one row first and then start with Table.PromoteHeaders?
1686135981948.png

2) The source files that is in the folder. The headers are in the 4th row, but I need the header to concatenate the first three rows into the 4th row then use the 4th row as header. Dont know if this is even possible?
1686135446704.png

3)Column C in the source files have all the same header, I used the second part of your code from OldColumn to NewColumns, but it does not add the file name. It just says "file 1 and file 2"
1686135587315.png

This is the file names
1686135606818.png

4) Last point, it seems that some lines are duplicated. I can fix it by removing duplicates, but it seems that the duplicated line has its columns moved up. The division should only have one letter in it, not numbers, that number in division column is the vendor number. So as a fix i can just filter out the numbers, but can the code be amended to avoid this? I have double checked the source files does not have the numbers in the division, the source files does not have a column "division"
1686135836018.png


Sorry for all the questions. I believe when this is done and fixed I will have what I need to make it work.

Thank you for all you assistance Sir.
let
Master = Table.PromoteHeaders(
Excel.Workbook(
File.Contents("C:\Basjan\Reports\Plano vs Listing\Master\Plano vs Listing Master.xlsx"), null, true
){[Item="Sheet1",Kind="Sheet"]}[Data]
),
#"Promoted Headers" = Table.PromoteHeaders(Master, [PromoteAllScalars=true]),

Source = Folder.Files("C:\Basjan\Reports\Plano vs Listing\Abi"),
Files = Table.AddColumn(Source, "Data",
each Excel.Workbook([Content], true, true){[Item="Sheet1",Kind="Sheet"]}[Data]
),
Tables = Table.SelectColumns(Files,{"Data"}),

WithMaster = List.Combine({{#"Promoted Headers"}, Tables[Data]}),

Accumulate = List.Accumulate(
WithMaster,
[t = null, i = 0],
(state, current) => [
t = if state = 0
then current
else
let
FileName = "File" & Text.From(state),
Nest = Table.NestedJoin(state[t], {"6 Article"}, current, {"6 Article"}, FileName, JoinKind.LeftOuter),
Columns = Table.ColumnNames(current),
OldColumns = List.LastN(Columns, List.Count(Columns) - 2),
NewColumns = List.Transform(OldColumns, each _ & "-" & FileName),
Result = Table.ExpandTableColumn(Nest, FileName, OldColumns, NewColumns)
in
Result,
i = state + 1
]
),
Result = Accumulate[t]
in
Result
 

Attachments

  • 1686135420172.png
    1686135420172.png
    11 KB · Views: 6
  • 1686135760444.png
    1686135760444.png
    13.5 KB · Views: 5
Upvote 0
  1. Is the first row of the master file all blank?
  2. I meant a unique identifier for each file, not actually the file name, so I used File-1, File-2, etc. If you certainly have unique column names, actually I believe so by looking at your file image, then no need to add anything.
  3. I didn't understand #4.
 
Last edited:
Upvote 0
Hi,
1) Yes the 1st row in the Master file is all blank
2)The Unique Identifier is the Article, but each source file has a common column, is there a way we can add the file name instead of "file1 and file2"
1686138716663.png

3) So the Unique Identifier which is the Article is duplicated is Highlighted red. Row 39 is what the value for each columns should be, row 40 is the duplicate with wrong data in it.
* the value in row40 in column A is the vendor details. This issue is not a train smash, I will be able to rectify it by removing the numbers in Column A
1686138808090.png
 
Upvote 0
The first 3 rows are removed and the file names are included in the column headers in the following code.
You don't need to change anything for the master file since the first empty row will be ignored in the following code.
I understand you'll handle #3 since it is data related.

Power Query:
let
    Master = Excel.Workbook(File.Contents(
                    "C:\SomeFolder\Master.xlsx"), 
                    true, 
                    true
            ){[Item="Sheet1",Kind="Sheet"]}[Data],

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data", 
                    each Table.PromoteHeaders(Table.RemoveFirstN(Excel.Workbook([Content], false, true){[Item="Sheet1",Kind="Sheet"]}[Data], 3))
    ),
    Tables = Table.SelectColumns(Files,{"Data", "Name"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0 
                then current 
                else
                    let 
                        FileName = Tables[Name]{state[i] - 1},
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        OldColumns = List.LastN(Columns, List.Count(Columns) - 1),
                        NewColumns = List.Transform(OldColumns, each _ & "-" & FileName),
                        Result = Table.ExpandTableColumn(Nest, FileName, OldColumns, NewColumns)
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result
 
Upvote 0
Solution
Thank
The first 3 rows are removed and the file names are included in the column headers in the following code.
You don't need to change anything for the master file since the first empty row will be ignored in the following code.
I understand you'll handle #3 since it is data related.

Power Query:
let
    Master = Excel.Workbook(File.Contents(
                    "C:\SomeFolder\Master.xlsx"),
                    true,
                    true
            ){[Item="Sheet1",Kind="Sheet"]}[Data],

    Source = Folder.Files("C:\SomeFolder\Data"),
    Files = Table.AddColumn(Source, "Data",
                    each Table.PromoteHeaders(Table.RemoveFirstN(Excel.Workbook([Content], false, true){[Item="Sheet1",Kind="Sheet"]}[Data], 3))
    ),
    Tables = Table.SelectColumns(Files,{"Data", "Name"}),

    WithMaster = List.Combine({{Master}, Tables[Data]}),

    Accumulate = List.Accumulate(
        WithMaster,
        [t = null, i = 0],
        (state, current) => [
            t = if state[i] = 0
                then current
                else
                    let
                        FileName = Tables[Name]{state[i] - 1},
                        Nest =  Table.NestedJoin(state[t], {"Item"}, current, {"Item"}, FileName, JoinKind.LeftOuter),
                        Columns = Table.ColumnNames(current),
                        OldColumns = List.LastN(Columns, List.Count(Columns) - 1),
                        NewColumns = List.Transform(OldColumns, each _ & "-" & FileName),
                        Result = Table.ExpandTableColumn(Nest, FileName, OldColumns, NewColumns)
                    in
                        Result,
            i = state[i] + 1
        ]
    ),
    Result = Accumulate[t]
in
    Result
you sir, I can work with this. Thank you very very much for your kind assistance
 
Upvote 0

Forum statistics

Threads
1,225,307
Messages
6,184,184
Members
453,220
Latest member
flyingdutchman_

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top