Power Query Remove Duplicates with Condition

DaveyD

New Member
Joined
May 20, 2015
Messages
31
I have a query to load files from a folder creating a list of files to work with.
Some of these files end with .prg and some with .lib.
Many of them appear in the list with both extensions.
I would like to remove the .lib version of those instances.
When I run "Remove Duplicates", Power Query removes the .lib version (usually).
I thought maybe it has to do with the list order because PQ removes the first in the list.
So, I tried sorting it by file name ascending and then by extension descending so .prg will appear first causing PQ to remove the second entry, but this didn't do the trick.
Even more confusing, PQ is not consistent regarding which one it removes. Sometimes it removes the .lib version and sometimes it removes the .prg version.

How can I get it to always remove the .lib version if there is a .prg version?


I would provide sample data but the problem is that with sample data it seems to work properly. Maybe it is only with the file system?
Here it is anyway:
Name​
Extension​
1​
.prg​
1b​
.prg​
2​
.prg​
a1​
.lib​
a1​
.prg​
a1addon​
.prg​
a1addon​
.lib​
aac​
.lib​
abc​
.prg​
acc​
.prg​
acc​
.lib​
 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
maybe try sort asc Name then sort asc Extension
lib should be above prg for the same Name
and Remove Duplicates from Name
 
Upvote 0
maybe try sort asc Name then sort asc Extension
and Remove Duplicates from Name
@sandy666 - thanks for the reply
I can try but I don't understand how that would help
But maybe I wasn't clear enough. I sorted Ascending by name and then descending by extension so that the names will be in order and the .prg version will appear first. Then I removed duplicates from the Name column.

Please let me know if we are understanding each other.
Thanks
 
Upvote 0
ok, I did mistake with asc vs desc
so
try this way
  1. set Name to type text
  2. sort Name asc
  3. sort Extension desc
  4. add prefix 0 to the Name
  5. Remove duplicates from Name
  6. Extract text after delimiter 0
before.png

After
rd.png
 
Last edited:
Upvote 0
Ok!, The second way:

1.Import table1 in PQ
2. Duplicate query and rename in Final
3. In first query "Table1" Remove Duplicate from Name
4. After filter by .lib extension
5. In "Final" query filter column Extension by ".prg"
6. Append query "Table1" to "Final"
7. Remove Duplicate from Name
 

Attachments

  • Snap1.png
    Snap1.png
    225.6 KB · Views: 104
Upvote 0

Forum statistics

Threads
1,223,153
Messages
6,170,395
Members
452,324
Latest member
stuart1980

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top