# Power Query: Remove Duplicates Based on Sort Order



## Qwest336 (Apr 17, 2017)

Good morning all!

I'm relatively new to Power Query and I'm having trouble removing the duplicates in one column after sorting rows. I've searched all over and found a few things that seem to indicate that Power Query uses the Source Order to determine which file to keep.  Since I am importing multiple text files from a directory, I can't control the sort order and PowerQuery is bringing in the oldest data first.  Because of this, regardless of how I sort, the oldest data (See: First source imported) is being kept.

Now, I have seen some references to using the Table Buffer on those sites but when I try to apply it to my table I get an error.  Here is the M-code for the table I am importing:


```
let
    Source = Folder.Files("C:\1097-1098 Applicant Tracking Tool\Version 3\datasets to load\1098Data"),
    #"Invoke Custom Function1" = Table.AddColumn(Source, "Transform Binary from 1098Data", each #"Transform Binary from 1098Data"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform Binary from 1098Data"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform Binary from 1098Data", Table.ColumnNames(#"Removed Other Columns1"[#"Transform Binary from 1098Data"]{0})),
    #"Split Column by Position" = Table.SplitColumn(#"Expanded Table Column1","Source.Name",Splitter.SplitTextByPositions({0, 10}, false),{"Source.Name.1", "Source.Name.2"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Position","Source.Name.2",Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv),{"Source.Name.2.1", "Source.Name.2.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Source.Name.2.1", "FileDate"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"SEPARATION_DATE"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Columns1",{"Source.Name.1", "Source.Name.2.2"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Req-App", each [REQUISITION] & " - " & [APPLICATION]),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"Req-App", Order.Ascending}, {"FileDate", Order.Descending}}),
    #"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Req-App"})
in
    #"Removed Duplicates"
```


Here are the changes to add the Table Buffer:


```
let
    Source = Folder.Files("C:\1097-1098 Applicant Tracking Tool\Version 3\datasets to load\1098Data"),
    #"Invoke Custom Function1" = Table.AddColumn(Source, "Transform Binary from 1098Data", each #"Transform Binary from 1098Data"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform Binary from 1098Data"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform Binary from 1098Data", Table.ColumnNames(#"Removed Other Columns1"[#"Transform Binary from 1098Data"]{0})),
    #"Split Column by Position" = Table.SplitColumn(#"Expanded Table Column1","Source.Name",Splitter.SplitTextByPositions({0, 10}, false),{"Source.Name.1", "Source.Name.2"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Position","Source.Name.2",Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv),{"Source.Name.2.1", "Source.Name.2.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Source.Name.2.1", "FileDate"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"SEPARATION_DATE"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Columns1",{"Source.Name.1", "Source.Name.2.2"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Req-App", each [REQUISITION] & " - " & [APPLICATION]),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"Req-App", Order.Ascending}, {"FileDate", Order.Descending}}),
    #"Buffered" = Table.Buffer( #”Sorted Rows”),
    #"Removed Duplicates" = Table.Distinct(#"Buffered", {"Req-App"})
in
    #"Removed Duplicates"
```

Could anyone shed some light on what I am doing incorrectly?  All help and insight is appreciated!


----------



## MarcelBeug (Apr 17, 2017)

Maybe you can help us out by pointing to the differences between the 2 code blocks.

If I understand correctly, the problem is in the last few steps?
In this video it is working fine, so please explain what's different at your side.

Edit: never mind. The cause is your double quotes are different. Replace them in the query editor.


----------



## Qwest336 (Apr 17, 2017)

Thanks for your response, MarcelBeug!

Sadly, it appears that the issue was with the types of quotations that came back when I pasted the code into a text editor.  the quotations used around the Sorted Rows item were like this


```
#”Sorted Rows”
```

instead of like this


```
#"Sorted Rows"
```

Unfortunately, that one change caused me to second guess what I was doing and work through the entire code, post this message, and search all around on other sites.

Thanks for looking into it!



MarcelBeug said:


> Maybe you can help us out by pointing to the differences between the 2 code blocks.
> 
> If I understand correctly, the problem is in the last few steps?
> In this video it is working fine, so please explain what's different at your side.
> ...


----------



## MarcelBeug (Apr 17, 2017)

Qwest336 said:


> ... the issue was with the types of quotations ...


That was exactly what I meant with the last line I added to my post.


----------

