Power Query: Removing duplicate values in Multi-Value Cells

lneidorf

Board Regular
Joined
May 20, 2004
Messages
97
Office Version
  1. 365
Platform
  1. Windows
Hi there.

Long-time Excel user and dabbler in VBA who discovered Power Query for the first time today.

I've used this to combine values spanning multiple rows into single row entries with multi-value fields/cells. An example should be helpful:

Here's a mock-up of my source data.
Column A Column B
Beatles John
Beatles Paul
Beatles George
Beatles Ringo
Beatles John
Stones Mick
Stones Keith
Stones Charlie
Stones Ronnie
Stones Mick

Band name in Col A; Band Member in Col B. Note that we have duplicates, with "Beatles | John" and "Stones | Mick" appearing twice.

I've used Power Query to roll this stuff up nicely, with the result a single row per value in Column A. My current results looks like this:

Column A Column B
Beatles John;Paul;George;Ringo;John
Stones Mick;Keith;Charlie;Ronnie;Mick

Column B contains multi-value field/cells, with individual values separated by semicolons.

The issue is the duplicates. It's important to note that my duplicates are duplicate values within single, multi-value cells. So the question is how to use Power Query to clean those cells and remove any duplicative values in each individual cell. In this cae, the offending values are the duplicate John and Mick values.

My goal is to get to this result:
Column A Column B
Beatles John;Paul;George;Ringo
Stones Mick;Keith;Charlie;Ronnie


I figure there's got to be a way to do this right in Power Query, but my web searches haven't found anything touching upon this.

Any input welcome.

Thanks!
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
try

Code:
[SIZE=1]// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Group = Table.Group(Source, {"Column1"}, {{"Count", each _, type table}}),
    List = Table.AddColumn(Group, "List", each List.Distinct(Table.Column([Count],"Column2"))),
    Extract = Table.TransformColumns(List, {"List", each Text.Combine(List.Transform(_, Text.From), ";"), type text})
in
    Extract[/SIZE]

[Table="width:, class:head"]
[tr=bgcolor:#FFFFFF][td=bgcolor:#5B9BD5]Column1[/td][td=bgcolor:#5B9BD5]Column2[/td][td][/td][td=bgcolor:#70AD47]Column1[/td][td=bgcolor:#70AD47]List[/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#DDEBF7]Beatles[/td][td=bgcolor:#DDEBF7]John[/td][td][/td][td=bgcolor:#E2EFDA]Beatles[/td][td=bgcolor:#E2EFDA]John;Paul;George;Ringo[/td][/tr]

[tr=bgcolor:#FFFFFF][td]Beatles[/td][td]Paul[/td][td][/td][td]Stones[/td][td]Mick;Keith;Charlie;Ronnie[/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#DDEBF7]Beatles[/td][td=bgcolor:#DDEBF7]George[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td]Beatles[/td][td]Ringo[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#DDEBF7]Beatles[/td][td=bgcolor:#DDEBF7]John[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td]Stones[/td][td]Mick[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#DDEBF7]Stones[/td][td=bgcolor:#DDEBF7]Keith[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td]Stones[/td][td]Charlie[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#DDEBF7]Stones[/td][td=bgcolor:#DDEBF7]Ronnie[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td]Stones[/td][td]Mick[/td][td][/td][td][/td][td][/td][/tr]
[/table]
 
Last edited:
Upvote 0
Thanks for this.

I'm afraid I've got a remedial follow up question: how would I execute this code? I've got about a day of experience with Power Query, so I'm afraid I'm a total newbie.

I've tried Add Column\Custom Column, entering the formula there. It executes, generating a new column. I've then expanded my new column, which generates a new column populated with the composite values I'm looking for, but doesn't remove any of my original rows. I've got dupes.

I imagine I'm missing something very basic here.

Lastly, I should ask: do you think this is scalable? I've got a spreadsheet with 45,000 rows I'd like to run this across.

Thanks!
 
Upvote 0
how would I execute this code?
Change source range to table (Ctrl+T), - be sure the name of the table is the same as in the code - in this case : Table1
then use From Table (Get&Transform part on the ribbon)
then open Advanced Editor and replace code there with copied from the post
then Close&Load

do you think this is scalable?

don't ask just try ;) ( if it works with 5 mln rows , it should work with your 45 000)
 
Last edited:
Upvote 0
sandy666,

Holy s$%t! Amazing.

I spent days playing with VBA and came up with a multi-step process that took 20-30 min to completely execute, with a lacking result. And this needs to be run across a dozen columns.

Thank you so much!
--LAN
 
Upvote 0

Forum statistics

Threads
1,223,878
Messages
6,175,141
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top