Remove All Rows Based on Duplicate Conditions

Randall · Sep 8, 2022

I have a data set that contains financial data. There are two types of records; R and D. Regular records are the total price paid for an order, and the D records are the discount for the Regular records. What I'm trying to do is determine how often we miss the discount in regards to our payment terms. In order to do this, I need to remove all R records that have a correlating D record. I've attempted to do this by removing duplicates based on specific columns (Invoice #, Item #, Date, etc). This works, but only removes the duplicate row and not both rows.

Is there a way to remove BOTH rows based on the duplicate condition, that way I'm only left with R records that contain discountable terms but have no correlating D record?

SullyWYO · Sep 8, 2022

I think the best strategy here is to Group By the "Num Side B" column, and count the entries. Then remove all results that have two entries. This would leave you with all of the records that did not have a corresponding discount line. Then you expand the data back out.

Depending on your dataset, you may need to tweak this a little bit - you may need to additionally group by Open Date, Vendor, etc. to make it work correctly. This is easily done using the Advanced "Group By" feature.

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Num Side B", Int64.Type}, {"Price Paid", type number}, {"Open Date", type datetime}, {"Payment Type", type text}, {"Discount Y/N", type text}, {"Vendor", Int64.Type}, {"Vend Name", type text}, {"CC Terms", Int64.Type}, {"CC Terms Desc", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Num Side B"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"All Data", each _, type table [Num Side B=nullable number, Price Paid=nullable number, Open Date=nullable datetime, Payment Type=nullable text, #"Discount Y/N"=nullable text, Vendor=nullable number, Vend Name=nullable text, CC Terms=nullable number, CC Terms Desc=nullable text]}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each [Count] <> 2),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Count"}),
    #"Expanded All Data" = Table.ExpandTableColumn(#"Removed Columns", "All Data", {"Price Paid", "Open Date", "Payment Type", "Discount Y/N", "Vendor", "Vend Name", "CC Terms", "CC Terms Desc"}, {"Price Paid", "Open Date", "Payment Type", "Discount Y/N", "Vendor", "Vend Name", "CC Terms", "CC Terms Desc"})
in
    #"Expanded All Data"

An example:

Remove All Rows Based on Duplicate Conditions

Randall

New Member

SullyWYO

New Member

Similar threads

Forum statistics

Share this page

Remove All Rows Based on Duplicate Conditions

Randall

New Member

SullyWYO

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock