Power Query for Excel removing columns question

Mr_Phil

Board Regular
Joined
May 28, 2018
Messages
147
Office Version
  1. 365
I'm really new at PQ and am confused by its behavior.

I have the import done from the folder.
I would like to remove unneeded columns.
The first set to go would be any column with the word "NOPE" in the column header.

I viewed a youTube video here that explained how to do away with columns based on text in the header. I did that with the first set (NOPES). It worked, yay for me. So I set about to do it again with different junk columns but based on different header criteria. It did, it but it brought back the previous NOPES that I'd thought were gone.
Code:
 #"Removed Columns" = Table.RemoveColumns(#"Changed Type",List.Select(Table.ColumnNames(#"Changed Type"),each Text.Contains(_,"NOPE_"))),
    #"Removed Columns1" = Table.RemoveColumns(#"Removed Columns",{"NOPE"}),
    Custom1 = Table.RemoveColumns(#"Changed Type",List.Select(Table.ColumnNames(#"Changed Type"),each Text.Contains("BACK","BACKFILL")))
in
    Custom1

I am seriously confused. There are hundreds of repeating junk columns in about 21 combined files. I thought I was making progress on weeding them out but.... now not so much.
Insight and advice always grateful, thank you for reading.

1694803797430.png
 

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
This might be a better solution if the headers might change.
Col1Col2Col3NOPE1NOPE2NOPE3Col4
249681474458245168126
119655731608473283456
706210384824800896495
548311212378883643713
172176182643419218769

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    DemotedHeaders = Table.DemoteHeaders(Source),
    TransposedTable = Table.Transpose(DemotedHeaders),
    FilteredRows = Table.SelectRows(TransposedTable, each not Text.Contains([Column1], "nope", Comparer.OrdinalIgnoreCase)),
    TransposedTableBack = Table.Transpose(FilteredRows),
    PromotedHeaders = Table.PromoteHeaders(TransposedTableBack, [PromoteAllScalars=true])
in
    PromotedHeaders
Col1Col2Col3Col4
249681474126
119655731456
706210384495
548311212713
172176182769
 
Upvote 1
In the Custom1 step, the current code is using the #"Changed Type" identifier. This is the previous step that is not shown in the code you posted.
You need to use the previous identifier (step name) to use the processed data set, which is actually #"Removed Columns1" in this code.

However, it is not the whole story for the provided code. The Text.Contains() function is also not used correctly. You have to check each unwanted text with a separate Text.Contains() function which the first parameter should be _ (the column name in this case) and combine them with "or".

Instead of doing that, if you are looking for removing all headers containing NOPE and BACK, then the following code will do that for you. You can delete the #"Removed Columns" and #"Removed Columns1" lines in your code since they won't be necessary with this change.

Power Query:
    Custom1 = Table.RemoveColumns(#"Changed Type", 
                List.Select(
                    Table.ColumnNames(#"Changed Type"), 
                    each List.AnyTrue(
                            List.Transform({"NOPE", "BACK"}, (searchfor) => Text.Contains(_, searchfor))
                        )
                    )
                )

Note: You don't need to list BACKFILL since BACK will already remove that column as well.
 
Upvote 1
Solution
In the Custom1 step, the current code is using the #"Changed Type" identifier. This is the previous step that is not shown in the code you posted.
You need to use the previous identifier (step name) to use the processed data set, which is actually #"Removed Columns1" in this code.

However, it is not the whole story for the provided code. The Text.Contains() function is also not used correctly. You have to check each unwanted text with a separate Text.Contains() function which the first parameter should be _ (the column name in this case) and combine them with "or".

Instead of doing that, if you are looking for removing all headers containing NOPE and BACK, then the following code will do that for you. You can delete the #"Removed Columns" and #"Removed Columns1" lines in your code since they won't be necessary with this change.

Power Query:
    Custom1 = Table.RemoveColumns(#"Changed Type",
                List.Select(
                    Table.ColumnNames(#"Changed Type"),
                    each List.AnyTrue(
                            List.Transform({"NOPE", "BACK"}, (searchfor) => Text.Contains(_, searchfor))
                        )
                    )
                )

Note: You don't need to list BACKFILL since BACK will already remove that column as well.
Wow. That did a lot. I followed the example in the video but assumed you had to make a discrete statement for each string. And I assumed the next item would reference the preceeding item just like the first one did. I did not realize that the dataset was bound up in that step. Is there a way to keep the first instance of a column and dispose of the duplicate columns? The first column is ITEM, then the techs name (and there can be up to a dozen in each output report that gets combined), then WH OH showing the warehouse qty. Then it repeats so there will be an ITEM_12 followed by however many techs were looked at (up to a dozen) then a WH OH_24 and so forth. I only need the first instance of ITEM. For now I will keep the tech names and the on hand counts for later.

The final desired result is an ITEM column, then all the Tech names, then the OH Columns. Hopefully I can figure out a way to unpivot all that so it will show what, who, how many on hand.

Honestly I do appreciate your help. I ran the modified query and the preview was about 240 columns lighter. If you are ever in Lubbock Texas look me up and Lunch or Dinner is on me.
 
Upvote 0
As I can see, the first instance is the whole word, then the duplicated ones have a suffix after an underscore. I think the following adjustment will remove all duplicates. In this version, of course, you'll have to list all possible strings you want to remove.

Power Query:
List.Transform({"NOPE_", "BACK_"}, (searchfor) => Text.Contains(_, searchfor))

Or, another idea is removing all columns with an underscore. This will be a shorter code since we can now use the Text.Contains() for a single string search.

Power Query:
    Custom1 = Table.RemoveColumns(#"Changed Type", 
                List.Select(
                    Table.ColumnNames(#"Changed Type"), 
                    each Text.Contains(_, "_")
                    )
                )

Let's say you prefer the last version but want to keep columns contains KEEPME string, such as KEEPME_123. Then you can change the condition as shown below:
Power Query:
    Custom1 = Table.RemoveColumns(#"Changed Type", 
                List.Select(
                    Table.ColumnNames(#"Changed Type"), 
                    each Text.Contains(_, "_") and not Text.Contains(_, "KEEPME")
                    )
                )
 
Upvote 1
Honestly I do appreciate your help. I ran the modified query and the preview was about 240 columns lighter. If you are ever in Lubbock Texas look me up and Lunch or Dinner is on me.
You're welcome. Glad to hear it helps.
(Sure will do. :) I visited Navasota many times and lived in Plano years ago for a short time, but I've never been to Lubbock. Texas is really BIG!)
 
Upvote 1

Forum statistics

Threads
1,223,317
Messages
6,171,419
Members
452,402
Latest member
siduslevis

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top