Power Query - remove duplicates in one column, based on values in another column

btwice · Jul 22, 2024

So I've looked at other solutions to this problem, and the one I implemented involved making a custom column that would combine 2 columns, sort, then remove duplicates from a different column to get the desired result. However I found that this was working for some duplicates, but not for others. See data below, and then the explanation of what I need to happen.

OrdNot_Vendor	Package Type Sort
35557413_121470651_vendor3	1.PEND package uploaded to folder
35557413_121470651_vendor3	0.Ready package uploaded to folder
46280026_129095815_vendor2	0.Ready package uploaded to folder
46280026_129095815_vendor2	1.PEND package uploaded to folder

So what you can see is the ordnot_vendor column can contain duplicate values for a vendor (same ord_not combination), but I only want to keep the value that has "0.Ready package uploaded to folder" and delete the "1.PEND package uploaded to folder" row. Currently I am combining these 2 columns into a new column, sorting ascending (so I have 0, 1, 2 etc.), then deleting the duplicate values in the "OrdNot_Vendor" column. This however resulted in a few where the "0.Ready package uploaded to folder" row would be deleted instead of the "1.PEND package uploaded to folder" row, not sure why but maybe it has to do with the way I combined these 2 columns and sorted? Either way, is there a better way to go about this that might be more accurate? Some form of if there's a duplicate, check the package type sort value for "Ready", if that value exists then delete the duplicates? Any help is appreciated, thanks.

alex78 · Jul 23, 2024

Hi @btwice,

an option by writing M code

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Grouped Rows" = Table.Combine(Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}})[Custom])
in
    #"Grouped Rows"

Regards,

btwice · Jul 23, 2024

This solution worked great, appreciate it. If you don't mind/have some time could you break down what's happening in this M code? I get what it's doing as a whole, but just haven't used all of these functions before. Thanks again!

alex78 · Jul 23, 2024

Hi @btwice,

Please see attempt to split code with explanation step by step.

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    // grouped rows by "OrdNot_Vendor" 
    #"Grouped Rows" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each _ }}),
    // grouped rows - list of "OrdNot_Vendor" column
    #"Grouped Rows1" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each [Package Type Sort] }}),
    // grouped rows - Extract first character of previous list
    #"Grouped Rows2" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each List.Transform([Package Type Sort], each Text.Start(_,1))}}),
    // grouped rows - find minimum value
    #"Grouped Rows3" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each List.Min(List.Transform([Package Type Sort], each Text.Start(_,1)))}}),
    // grouped rows - select rows matching previous minimum value from step #"Grouped Rows"
    #"Grouped Rows4" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each 
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}}),
    // grouped rows - select column "Custom" with final tables 
    #"Grouped Rows5" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each 
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}})[Custom],
    // combine tables for each rows
    Result = Table.Combine(#"Grouped Rows5")
in
    Result

Hopefully this helps.

Regards,

btwice · Jul 23, 2024

Makes sense, thanks a ton!

Power Query - remove duplicates in one column, based on values in another column

btwice

New Member

alex78

Board Regular

btwice

New Member

alex78

Board Regular

btwice

New Member

Similar threads

Share this page

Power Query - remove duplicates in one column, based on values in another column

btwice

New Member

alex78

Board Regular

btwice

New Member

alex78

Board Regular

btwice

New Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock