Expert level: merge multiple Forms in folders to one table

BVL

New Member
Joined
Dec 13, 2019
Messages
8
Office Version
  1. 2016
Platform
  1. Windows
Hi all,
Our change management system creates Excel forms (="ECO" engineering change order) to log all changes to drawings. These forms are kept (and updated) in a separate folder (Folder_year>Folder_Month>Folder_ChangeID>Excel Form, example: 2020/03/ECO123456)
I would like to create an overview of all (or some if easier) data cells in that Form so I can keep track of ECO's that are running late or look into the past to do an analysis of what our current throughput time is so we can improve that.
I've added my example and "how it should look like" in the Wetransfer link (not sure how to upload .zip files in this thread). And a simple jpg with the yellow data cells that I want to translate to a table as well as a jpg what is currently my output.

My number one challenge: how to translate a form to a table?

I'm able to load all files through "PowerQuery>from Folder", but then I get a scattered, unusable table were my headers are in the rows and the table contains a lot of empty cells.

My level of competence: Novice in SQL, created 2 Power querys with this forums help and no experience at all in Macros (but eager to learn)

Example files: MergeFormsToTable.zip

Hopefully somebody is up for the challenge :)

(oh and change the source system is not an option. This is off-the-shelf software and I'd like to keep it that way. They have a plugin to do this, but it contains other functionalities we don't need and is way too expensive to do this)
 

Attachments

  • 2020-11-14_16-32-35.jpg
    2020-11-14_16-32-35.jpg
    89 KB · Views: 52
  • 2020-11-14_16-35-05.jpg
    2020-11-14_16-35-05.jpg
    162.2 KB · Views: 44
  • 2020-11-14_16-47-25.jpg
    2020-11-14_16-47-25.jpg
    132.7 KB · Views: 38

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
I've had a quick look at your request and the data provided. It should be possible to use PQ to untangle the data but it's not a trivial exercise. One thing I noticed is that the form doesn't look like it is completely static; it looks like some of the form fields can have multiple entries.
If I was to tackle this I would split the form up using the various data patterns. For example I would first extract the ECO number "sub-form" data and process that and then move on to the other various sub-forms. Finally I would re-combine the data.
You are will have to become familiar with a number of techniques to re-organise the data. For example; adding an index column; transposing the data; promoting and demoting headers; Filtering; merging and splitting columns; duplicating and referencing queries; pivoting and unpivotting; merging queries.
Peter
 
Upvote 0
Thanks for your reply Peter. I'll look into the terms your using and see if I can figure something out.
Just to be sure, you would do everything within PQ, right?
 
Upvote 0
I've had a quick look at your request and the data provided. It should be possible to use PQ to untangle the data but it's not a trivial exercise. One thing I noticed is that the form doesn't look like it is completely static; it looks like some of the form fields can have multiple entries.
If I was to tackle this I would split the form up using the various data patterns. For example I would first extract the ECO number "sub-form" data and process that and then move on to the other various sub-forms. Finally I would re-combine the data.
You are will have to become familiar with a number of techniques to re-organise the data. For example; adding an index column; transposing the data; promoting and demoting headers; Filtering; merging and splitting columns; duplicating and referencing queries; pivoting and unpivotting; merging queries.
Peter
Peter, I've just noticed that the source files are all quite well "named". I get a full list of all the cells I need in the "name manager". Isn't that something I could use? Feels like a good shortcut, but I have no clue how I could use it with Power Query (probably still need PQ to combine all the files?)
 
Upvote 0
I'm fairly certain you could do this with PQ but it won't be easy and I don't know how performant the result will be if you are processing multiple forms.
This is an example of splitting tables but on much more regular structured data:
Look for other blogs by Ken Puls and Miguel Escobar. If you want to learn PQ personally I found "M is for Data Monkey" was how my eyes were opened; unfortunately it's a bit outdated now.
 
Upvote 0
Peter, I've just noticed that the source files are all quite well "named". I get a full list of all the cells I need in the "name manager". Isn't that something I could use? Feels like a good shortcut, but I have no clue how I could use it with Power Query (probably still need PQ to combine all the files?)
That's a game changer! Let me think about that. I've never worked with extracting data from multiple named ranges from files in a folder. Perhaps someone else with practical experience can jump in?
 
Upvote 0
The following code is not pretty but it sort of works. I've filtered the file names to only include Flow1 in the name because as far as I can see this data is not in a named range.
3 Queries
First a query to load the data from the folder.
Name: "2020"
Power Query:
let
    Source = Folder.Files("[I]Type your folder path name here[/I]"),
    #"Filtered Rows1" = Table.SelectRows(Source, each Text.StartsWith([Name], "ECO")),
    #"Filtered Rows" = Table.SelectRows(#"Filtered Rows1", each ([Name] <> "Example.xlsx") and ([Extension] = ".xlsx")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"})
in
    #"Removed Other Columns"

Next a function query called "BinaryToTable":
Power Query:
let
    Source = (List as binary) => Excel.Workbook(List)
in
    Source

Finally a query to extract the content:
Name "Flow1 content"
Power Query:
let
    Source = #"2020",
    #"Filtered Rows3" = Table.SelectRows(Source, each Text.Contains([Name], "Flow1")),
    #"Added Index" = Table.AddIndexColumn(#"Filtered Rows3", "Index", 0, 1),
    Content = #"Added Index"[Content],
    Custom1 = List.Transform(Content,Binary.Buffer),
    Custom2 = List.Transform(Custom1,BinaryToTable),
    Custom3 = Table.Combine(Custom2),
    #"Filtered Rows" = Table.SelectRows(Custom3, each ([Kind] = "DefinedName")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Data"}),
    #"Added Index1" = Table.AddIndexColumn(#"Removed Other Columns", "Index", 1, 1),
    #"Filtered Rows1" = Table.SelectRows(#"Added Index1", each ([Name] = "BOM_PE_com")),
    #"Added Index2" = Table.AddIndexColumn(#"Filtered Rows1", "Index.1", 1, 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index2", {"Index"}, "Added Index2", JoinKind.LeftOuter),
    #"Expanded Added Index2" = Table.ExpandTableColumn(#"Merged Queries", "Added Index2", {"Index.1"}, {"Index.1"}),
    #"Filled Down" = Table.FillDown(#"Expanded Added Index2",{"Index.1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Index.1", "Name", "Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Reordered Columns", "Data", {"Column1"}, {"Column1"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Data",{{"Column1", type text}}),
    #"Filtered Rows2" = Table.SelectRows(#"Changed Type", each ([Column1] <> "#(lf)    " and [Column1] <> " ")),
    #"Pivoted Column" = Table.Pivot(#"Filtered Rows2", List.Distinct(#"Filtered Rows2"[Name]), "Name", "Column1")
in
    #"Pivoted Column"

I'm fairly certain there is a neater way to do this using grouping but I couldn't remember the correct technique.
Peter
 
  • Like
Reactions: BVL
Upvote 0
Solution
BVL
It's been 4 days since I posted a potential solution to your problem. I don't mind the couple of hours I spent sorting it out but would appreciate some feedback; even if negative.
Yours Peter
 
Upvote 0
Hi Peter, I've been struggling to get it to work but in the end I was able, with some help from multiple colleagues, to get the data I want through your solution!
So thank you very much for your effort. Much appreciated!!
 
Upvote 0

Forum statistics

Threads
1,223,757
Messages
6,174,322
Members
452,555
Latest member
colc007

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top