Power Query - remove duplicates in one column, based on values in another column

btwice

New Member
Joined
Dec 16, 2014
Messages
22
Office Version
  1. 365
Platform
  1. Windows
So I've looked at other solutions to this problem, and the one I implemented involved making a custom column that would combine 2 columns, sort, then remove duplicates from a different column to get the desired result. However I found that this was working for some duplicates, but not for others. See data below, and then the explanation of what I need to happen.

OrdNot_VendorPackage Type Sort
35557413_121470651_vendor31.PEND package uploaded to folder
35557413_121470651_vendor30.Ready package uploaded to folder
46280026_129095815_vendor20.Ready package uploaded to folder
46280026_129095815_vendor21.PEND package uploaded to folder

So what you can see is the ordnot_vendor column can contain duplicate values for a vendor (same ord_not combination), but I only want to keep the value that has "0.Ready package uploaded to folder" and delete the "1.PEND package uploaded to folder" row. Currently I am combining these 2 columns into a new column, sorting ascending (so I have 0, 1, 2 etc.), then deleting the duplicate values in the "OrdNot_Vendor" column. This however resulted in a few where the "0.Ready package uploaded to folder" row would be deleted instead of the "1.PEND package uploaded to folder" row, not sure why but maybe it has to do with the way I combined these 2 columns and sorted? Either way, is there a better way to go about this that might be more accurate? Some form of if there's a duplicate, check the package type sort value for "Ready", if that value exists then delete the duplicates? Any help is appreciated, thanks.
 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
Hi @btwice,

an option by writing M code

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Grouped Rows" = Table.Combine(Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}})[Custom])
in
    #"Grouped Rows"

Regards,
 
Upvote 0
Solution
This solution worked great, appreciate it. If you don't mind/have some time could you break down what's happening in this M code? I get what it's doing as a whole, but just haven't used all of these functions before. Thanks again!
 
Upvote 0
Hi @btwice,

Please see attempt to split code with explanation step by step.

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    // grouped rows by "OrdNot_Vendor" 
    #"Grouped Rows" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each _ }}),
    // grouped rows - list of "OrdNot_Vendor" column
    #"Grouped Rows1" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each [Package Type Sort] }}),
    // grouped rows - Extract first character of previous list
    #"Grouped Rows2" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each List.Transform([Package Type Sort], each Text.Start(_,1))}}),
    // grouped rows - find minimum value
    #"Grouped Rows3" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each List.Min(List.Transform([Package Type Sort], each Text.Start(_,1)))}}),
    // grouped rows - select rows matching previous minimum value from step #"Grouped Rows"
    #"Grouped Rows4" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each 
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}}),
    // grouped rows - select column "Custom" with final tables 
    #"Grouped Rows5" = Table.Group(Source, {"OrdNot_Vendor"}, {{"Custom", each 
            let x = List.Min(List.Transform([Package Type Sort], each Text.Start(_,1))) in
            Table.SelectRows(_, each Text.Start([Package Type Sort],1) = x)}})[Custom],
    // combine tables for each rows
    Result = Table.Combine(#"Grouped Rows5")
in
    Result


Hopefully this helps.

Regards,
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top