Challenging Append Scenario - Power Query

SeveralTradesLater

New Member
Joined
May 30, 2023
Messages
14
Office Version
  1. 2019
Platform
  1. Windows
Hi everyone. Happy July 4th to those who celebrate. Hope you smart folks can assist. Let's see who can beat a large language learning model at giving step by step instructions.

Here is the scenario,

I have a folder, let's call this folder "ALL". The folder "ALL" contains more folders North, West, East and South. Folders North, West, East and South each contain various Company Workbooks. Each workbook has a company name, but for our purposes let's assume they are named Company 001 thru Company 020 all divided randomly within Folders North, West, East and South.

However, within each North, West, East and South folder I am only focused on the workbooks that contain "(Reviewed)" in the file name. Example: "Company 003 (Reviewed)" , "Company 007 "Reviewed".

Each workbook contains multiple sheets of irrelevant data. There are only two sheets with each workbook I am concerned about, let's assume they are named sheets "A" and "B". In most cases the workbooks contain both sheets "A" and "B", in very few cases they contain either sheet "A" or sheet "B".

Both sheets "A" and "B" contain the same headers. However, those headers consist of five rows of useless labels. All data I am trying to append exists below those five rows.

Normally, I would have to copy and paste columns A, B and C of sheets "A" and "B" into a new worksheet and combine all of them thru Company 020.

Example Company 001 - Reviewed
Company 001 - ReviewedSheet AApple $1.00$3.00
Company 001 - ReviewedSheet A Orange$5.00$8.00
Company 001 - ReviewedSheet AKiwi$3.50$2.00
Company 001 - ReviewedSheet BBanana $1.20$4.40
Company 001 - ReviewedSheet BCoffee$5.00$7.00


Example of final result I want to produce and load to another sheet in Power Query.
Company 001 - ReviewedSheet AApple $1.00$3.00
Company 001 - ReviewedSheet A Orange$5.00$8.00
Company 001 - ReviewedSheet AKiwi$3.50$2.00
Company 001 - ReviewedSheet BBanana $1.20$4.40
Company 001 - ReviewedSheet BCoffee$5.00$7.00
Company 005 - ReviewedSheet ACandy$1.00$3.00
Company 005 - ReviewedSheet A Orange$5.00$8.00
Company 005 - ReviewedSheet APens$3.50$2.00
Company 005 - ReviewedSheet BPencils$1.20$4.40
Company 005 - ReviewedSheet BPaper$5.00$7.00
Company 007 - ReviewedSheet BApple $1.00$3.00
Company 007 - ReviewedSheet BOrange$5.00$8.00
Company 009 - ReviewedSheet AComic Books$3.50$2.00
Company 009 - ReviewedSheet ABanana $1.20$4.40
Company 009 - ReviewedSheet ACoffee$5.00$7.00
Company 015 - ReviewedSheet BSoda$1.00$3.00
Company 015- ReviewedSheet BClothing$5.00$8.00
Company 017 - ReviewedSheet APlants$3.50$2.00
Company 019 - ReviewedSheet ABanana $1.20$4.40
Company 020 - ReviewedSheet BVitamins$5.00$7.00
 

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
This is really simple, and although the main M Code is here, you'll want to do it yourself rather than copy and paste. There's also one line that you have to manually change.
So, you start with Data -> Get Data -> From File -> From Folder, and point to your ALL folder. You'll be presented with a table of every file. Using the UI, make changes as needed. Here's the main query that delivers the final table:

Power Query:
let
    Source = Folder.Files("C:\Users\USERNAME\Desktop\ALL"),

    // The following 2 lines make sure you're getting only Excel files
    LowercasedExtension = Table.TransformColumns(Source,{{"Extension", Text.Lower, type text}}),
    FilteredOnlyXlsxRows = Table.SelectRows(LowercasedExtension, each [Extension] = ".xlsx"),

    // Next, only files with "Reviewed" in the file Name
    FilteredOnlyFilesWithReviewed = Table.SelectRows(FilteredOnlyXlsxRows, each Text.Contains([Name], "Reviewed")),

    // Next line removes the .xlsx extension to clean up the name
    RemoveXlsxFromFilename = Table.ReplaceValue(FilteredOnlyFilesWithReviewed,".xlsx","",Replacer.ReplaceText,{"Name"}),

    // This allows you to retain the Region folder name
    RemovePathToRegion = Table.ReplaceValue(RemoveXlsxFromFilename,"C:\Users\jdell\Desktop\ALL\","",Replacer.ReplaceText,{"Folder Path"}),
    RemoveTrailingBackslash = Table.ReplaceValue(RemovePathToRegion,"\","",Replacer.ReplaceText,{"Folder Path"}),

    // The next 3 lines of code are generated AUTOMATICALLY when you go to expand the Content column
    FilteredHiddenFiles1 = Table.SelectRows(RemoveTrailingBackslash, each [Attributes]?[Hidden]? <> true),
    InvokeCustomFunction1 = Table.AddColumn(FilteredHiddenFiles1, "Transform File", each #"Transform File"([Content])),
// NOTE: This code generated will only keep the "Transform File" column. ADD "Folder Path", "Name", MANUALLY
    RemovedOtherColumns1 = Table.SelectColumns(InvokeCustomFunction1, {"Folder Path", "Name", "Transform File"}),

    RenamedFolderPathToRegion = Table.RenameColumns(RemovedOtherColumns1,{{"Folder Path", "Region"}}),
    ExpandedTableColumn1 = Table.ExpandTableColumn(RenamedFolderPathToRegion, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),

    // For whatever reason, this has to be added manually here, not in the Transform Sample File query
    ChangedType = Table.TransformColumnTypes(ExpandedTableColumn1,{{"Region", type text}, {"Name", type text}, {"Column1", type text}, {"Column2", type number}, {"Column3", Int64.Type}})
in
    ChangedType
Note the comments in the code above. Also, I renamed all the steps removing spaces to make the code clearer and avoid that awful #"" nonsense!

This is the resulting table:
Book1
ABCDE
1RegionNameColumn1Column2Column3
2EastCompany 001 ReviewedApple13
3EastCompany 001 ReviewedOrange58
4EastCompany 001 ReviewedKiwi3.52
5EastCompany 005 ReviewedCandy13
6EastCompany 005 ReviewedOrange58
7EastCompany 005 ReviewedPens3.52
8NorthCompany 001 ReviewedApple13
9NorthCompany 001 ReviewedOrange58
10NorthCompany 001 ReviewedKiwi3.52
11NorthCompany 005 ReviewedCandy13
12NorthCompany 005 ReviewedOrange58
13NorthCompany 005 ReviewedPens3.52
14SouthCompany 001 ReviewedApple13
15SouthCompany 001 ReviewedOrange58
16SouthCompany 001 ReviewedKiwi3.52
17SouthCompany 005 ReviewedCandy13
18SouthCompany 005 ReviewedOrange58
19SouthCompany 005 ReviewedPens3.52
20WestCompany 001 ReviewedApple13
21WestCompany 001 ReviewedOrange58
22WestCompany 001 ReviewedKiwi3.52
23WestCompany 005 ReviewedCandy13
24WestCompany 005 ReviewedOrange58
25WestCompany 005 ReviewedPens3.52
Sheet1


When you expand the Content column, Power Query will create a folder with sub folder and 4 queries. That will look like this:
1688492579854.png

That is why you need to do it manually. All of that is not really needed, but that is how the UI handles it.

If you need to do manipulations to all the files, do it in the Transform Sample File query. You can rename the columns in either place.

I only used 2 files because it was such a pain to pull in the data. I suggest you use XL2BB when posting data in the future.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,177
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top