Power Query - Dealing with Varying Column Numbers and Cell with Varying/Multiple Values/Multiple Formats

Athopp

New Member
Joined
Jul 12, 2022
Messages
12
Office Version
  1. 365
Platform
  1. Windows
I have a bit of a head scratcher.

My questions:
  1. I have tracking boxes (with varying number of columns) in a word document that I would like to line up into master dataset in excel. I am planning to convert those word documents to PDF and extract info.
  2. I am experiencing difficulty because:
    1. I am trying to combine tracking boxes that sometimes have only three Steps (e.g., Step 1, 3 and 5) with other tracking boxes with all Steps (see examples below). (Note: I am hoping that if a particular PDF file has no Step 6, for example, it simply says N/A).
    2. The tracking boxes have Steps and dates combined into one cell (I cannot change this) and sometimes there are multiple dates included in the cell (see example 3). (again, cannot change this)
    3. Sometimes there would be longer tracking boxes that have two row of data (Step 1,2,2,3,3,4,5,6). I am assuming I would need to 'flatten' the cell out so it all appears as one row (see example 2)
    4. Some steps (Step 5 and 6) could have text (closed field like "immediate") or a date.

Does anyone have suggestions?

Desired outcome
ItemStep 1Step 2Step 3Step 4Step 5Step 6
1Jan 13, 2022N/AJun 31, 2022N/AN/ASep 16, 2022
2Feb 3, 2022Feb 4, 2022
Oct 14, 2022
Nov 28, 2022Dec 1, 2022Jan 9, 2022Feb 11, 2023
Feb 12, 2033
3Jan 3, 2022Feb 12, 2022
Mar 9, 2022
N/A
April 12, 2022

N/A N/A
4Mar 3, 2022
Mar 18, 2022
Apr 12, 2022Apr 18, 2022Jun 3, 2022Immediately


Source Data

Example 1

Step 1

Jan 13, 2022
Step 3
Jun 31, 2022
Step 6
Sep 12, 2022


Example 2
Step 1
Feb 3, 2022
Step 2
Feb 4, 2022
Step 3
September 12, 2022
Step 2
Oct 14, 2022
Step 3
Nov 28, 2022
Step 4
Dec 1, 2022
Step 5
Jan 9,2023
Step 6
Feb 11, 2023
Step 6
Feb 12, 2023

Example 3
Step 1
Jan 3, 2022
Step 2
Feb 12, 2022
Mar 9, 2022
Step 4
April 12, 2022

Example 4
Step 1
Mar 3, 2022
Step 2
Mar 18, 2022
Step 3
April 12, 2022
Step 4
April 18, 2022
Step 5
Jun 3, 2022
Step 6
Immediately
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
Rather than converting to pdf, I would consider simply saving the source data as a text file, which will probably have a form similar to this:
Example 1
Step 1

Jan 13, 2022
Step 3
Jun 31, 2022
Step 6
Sep 12, 2022


Example 2
Step 1
Feb 3, 2022
Step 2
Feb 4, 2022
Step 3
September 12, 2022
Step 2
Oct 14, 2022
Step 3
Nov 28, 2022
Step 4
Dec 1, 2022
Step 5
Jan 9,2023
Step 6
Feb 11, 2023
Step 6
Feb 12, 2023




Example 3
Step 1
Jan 3,
~~~truncated to conserve space

And then load that text file into PQ and transform it. I did that with the small example posted using this M Code:
Power Query:
let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\...your path to the text text file \Example 1.txt"), null, null, 1252)}),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Column1] <> "")),
    #"Added Conditional Column2" = Table.AddColumn(#"Filtered Rows", "CustEx", each if Text.Contains([Column1], "Example") then [Column1] else null),
    #"Added Conditional Column" = Table.AddColumn(#"Added Conditional Column2", "CustSt", each if Text.Contains([Column1], "Step") then [Column1] else null),
    #"Added Conditional Column3" = Table.AddColumn(#"Added Conditional Column", "CustVal", each if [CustEx] = null and [CustSt] = null then [Column1] else null),
    #"Filled Down" = Table.FillDown(#"Added Conditional Column3",{"CustEx", "CustSt"}),
    #"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([CustVal] <> null)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"Column1"}),
    #"Sorted Rows" = Table.Sort(#"Removed Columns",{{"CustEx", Order.Ascending}, {"CustSt", Order.Ascending}}),
    #"Grouped Rows3" = Table.Group(#"Sorted Rows", {"CustEx", "CustSt"}, {{"Contents", each Text.Combine([CustVal]," / "), type text}}),
    #"Pivoted Column" = Table.Pivot(#"Grouped Rows3", List.Distinct(#"Grouped Rows3"[CustSt]), "CustSt", "Contents"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"CustEx", "Step 1", "Step 2", "Step 3", "Step 4", "Step 5", "Step 6"})
in
    #"Reordered Columns"
...and obtained this output:
CustExStep 1Step 2Step 3Step 4Step 5Step 6
Example 1Jan 13, 2022Jun 31, 2022Sep 12, 2022
Example 2Feb 3, 2022Feb 4, 2022 / Oct 14, 2022September 12, 2022 / Nov 28, 2022Dec 1, 2022Jan 9,2023Feb 11, 2023 / Feb 12, 2023
Example 3Jan 3, 2022Feb 12, 2022 / Mar 9, 2022April 12, 2022
Example 4Mar 3, 2022Mar 18, 2022April 12, 2022April 18, 2022Jun 3, 2022Immediately
 
Upvote 0
To follow up, I noticed a few things and have made some minor adjustments:
1. I'm assuming Jun 31 was meant to be Jun 30, and so I changed the input text file directly.
2. Some months are spelled out while others are abbreviated. I created a translation table in Excel, converted it to a table named "replacements" and then added it as a query in Power Query so that it could be used in the main query to replace full month names with abbreviations. The table looks like this:
FindReplace
JanuaryJan
FebruaryFeb
MarchMar
AprilApr
MayMay
JuneJun
JulyJul
AugustAug
SeptemberSep
OctoberOct
NovemberNov
DecemberDec

And the M code associated with the "replacements" table is trivial:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="replacements"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Find", type text}, {"Replace", type text}})
in
    #"Changed Type"
3. M code improvements were made to make the month name conversions, parse multi-element cells in the final out with line breaks instead of slashes (more on this below), and replace "null" results with "N/A" to mimic the desired output. The M code with these changes is:
Power Query:
let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\...full path to text file...Example 1.txt in my version...\Example 1.txt"), null, null, 1252)}),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Column1] <> "")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each List.Accumulate(
    List.Numbers(0, Table.RowCount(replacements)), 
    [Column1], 
    (state, current) => 
        Text.Replace(state, 
            replacements[Find]{current},
            replacements[Replace]{current}))),
    #"Added Conditional Column2" = Table.AddColumn(#"Added Custom", "Examples", each if Text.Contains([Custom], "Example") then [Custom] else null),
    #"Added Conditional Column" = Table.AddColumn(#"Added Conditional Column2", "CustSt", each if Text.Contains([Custom], "Step") then [Custom] else null),
    #"Added Conditional Column3" = Table.AddColumn(#"Added Conditional Column", "CustVal", each if [Examples] = null and [CustSt] = null then [Custom] else null),
    #"Filled Down" = Table.FillDown(#"Added Conditional Column3",{"Examples", "CustSt"}),
    #"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([CustVal] <> null)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"Column1","Custom"}),
    #"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Examples", Order.Ascending}, {"CustSt", Order.Ascending}}),
    #"Grouped Rows3" = Table.Group(#"Sorted Rows", {"Examples", "CustSt"}, {{"Contents", each Text.Combine([CustVal],"#(lf)"), type text}}),
    #"Pivoted Column" = Table.Pivot(#"Grouped Rows3", List.Distinct(#"Grouped Rows3"[CustSt]), "CustSt", "Contents"),
    #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"Examples", "Step 1", "Step 2", "Step 3", "Step 4", "Step 5", "Step 6"}),
    #"Replaced Value" = Table.ReplaceValue(#"Reordered Columns",null,"N/A",Replacer.ReplaceValue,{"Step 1", "Step 2", "Step 3", "Step 4", "Step 5", "Step 6"})
in
    #"Replaced Value"
Upon Closing and Loading the PQ results back to a worksheet in Excel, the multi-element cells will seemingly not honor the line feed character that was used as a delimiter to make each date appear on a separate line. To remedy this in Excel, click on one of the problematic cells, then click in the formula bar and then click back into the cell in the worksheet. For some reason, that causes Excel to re-evaluate the contents and the line feeds are honored. Then copy that cell, select the entire body of the table under Steps 1-6 and Paste Special > Format to apply this formatting style . The final results:
ExamplesStep 1Step 2Step 3Step 4Step 5Step 6
Example 1Jan 13, 2022N/AJun 30, 2022N/AN/ASep 12, 2022
Example 2Feb 3, 2022Feb 4, 2022 Oct 14, 2022Sep 12, 2022 Nov 28, 2022Dec 1, 2022Jan 9,2023Feb 11, 2023 Feb 12, 2023
Example 3Jan 3, 2022Feb 12, 2022 Mar 9, 2022Apr 12, 2022N/AN/A
Example 4Mar 3, 2022Mar 18, 2022Apr 12, 2022Apr 18, 2022Jun 3, 2022Immediately
 
Upvote 0
Hi KRice,

Thank you so much for this idea! I am going to try this approach now and see how I fair! I can't tell you how much I appreciate your thoughtfulness and insight.
 
Upvote 0
Good luck...and post back if you run into any issues.
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,324
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top