Power Query: Remove Duplicates Based on Sort Order

Qwest336

Board Regular
Joined
Jun 24, 2015
Messages
53
Good morning all!

I'm relatively new to Power Query and I'm having trouble removing the duplicates in one column after sorting rows. I've searched all over and found a few things that seem to indicate that Power Query uses the Source Order to determine which file to keep. Since I am importing multiple text files from a directory, I can't control the sort order and PowerQuery is bringing in the oldest data first. Because of this, regardless of how I sort, the oldest data (See: First source imported) is being kept.

Now, I have seen some references to using the Table Buffer on those sites but when I try to apply it to my table I get an error. Here is the M-code for the table I am importing:

Code:
let
    Source = Folder.Files("C:\1097-1098 Applicant Tracking Tool\Version 3\datasets to load\1098Data"),
    #"Invoke Custom Function1" = Table.AddColumn(Source, "Transform Binary from 1098Data", each #"Transform Binary from 1098Data"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform Binary from 1098Data"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform Binary from 1098Data", Table.ColumnNames(#"Removed Other Columns1"[#"Transform Binary from 1098Data"]{0})),
    #"Split Column by Position" = Table.SplitColumn(#"Expanded Table Column1","Source.Name",Splitter.SplitTextByPositions({0, 10}, false),{"Source.Name.1", "Source.Name.2"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Position","Source.Name.2",Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv),{"Source.Name.2.1", "Source.Name.2.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Source.Name.2.1", "FileDate"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"SEPARATION_DATE"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Columns1",{"Source.Name.1", "Source.Name.2.2"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Req-App", each [REQUISITION] & " - " & [APPLICATION]),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"Req-App", Order.Ascending}, {"FileDate", Order.Descending}}),
    #"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Req-App"})
in
    #"Removed Duplicates"


Here are the changes to add the Table Buffer:

Code:
let
    Source = Folder.Files("C:\1097-1098 Applicant Tracking Tool\Version 3\datasets to load\1098Data"),
    #"Invoke Custom Function1" = Table.AddColumn(Source, "Transform Binary from 1098Data", each #"Transform Binary from 1098Data"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform Binary from 1098Data"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform Binary from 1098Data", Table.ColumnNames(#"Removed Other Columns1"[#"Transform Binary from 1098Data"]{0})),
    #"Split Column by Position" = Table.SplitColumn(#"Expanded Table Column1","Source.Name",Splitter.SplitTextByPositions({0, 10}, false),{"Source.Name.1", "Source.Name.2"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Position","Source.Name.2",Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv),{"Source.Name.2.1", "Source.Name.2.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Source.Name.2.1", "FileDate"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"SEPARATION_DATE"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Columns1",{"Source.Name.1", "Source.Name.2.2"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Req-App", each [REQUISITION] & " - " & [APPLICATION]),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"Req-App", Order.Ascending}, {"FileDate", Order.Descending}}),
    #"Buffered" = Table.Buffer( #”Sorted Rows”),
    #"Removed Duplicates" = Table.Distinct(#"Buffered", {"Req-App"})
in
    #"Removed Duplicates"

Could anyone shed some light on what I am doing incorrectly? All help and insight is appreciated!
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
Maybe you can help us out by pointing to the differences between the 2 code blocks.

If I understand correctly, the problem is in the last few steps?
In this video it is working fine, so please explain what's different at your side.

Edit: never mind. The cause is your double quotes are different. Replace them in the query editor.
 
Last edited:
Upvote 0
Thanks for your response, MarcelBeug!

Sadly, it appears that the issue was with the types of quotations that came back when I pasted the code into a text editor. the quotations used around the Sorted Rows item were like this

Code:
#”Sorted Rows”

instead of like this

Code:
#"Sorted Rows"

Unfortunately, that one change caused me to second guess what I was doing and work through the entire code, post this message, and search all around on other sites.

Thanks for looking into it!

Maybe you can help us out by pointing to the differences between the 2 code blocks.

If I understand correctly, the problem is in the last few steps?
In this video it is working fine, so please explain what's different at your side.

Edit: never mind. The cause is your double quotes are different. Replace them in the query editor.
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,917
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top