Power Query: Removing duplicate values in Multi-Value Cells

lneidorf

Board Regular
Joined
May 20, 2004
Messages
97
Office Version
  1. 365
Platform
  1. Windows
Hi there.

Long-time Excel user and dabbler in VBA and Power Query.

I've got a variation on a problem I posted a few years ago in this thread.

There, folks helped me use PowerQuery to collapse data, combining multiple rows into single, multi-value cells while removing duplicates. The wrinkle now is how to accomplish that across multiple columns and sort the contents.

An example should be helpful. Here's a mock-up of my source data:

BAND | MEMBER | INSTRUMENT
Beatles | John | Guitar
Beatles | Paul | Bass
Beatles | George | Guitar
Beatles | Ringo | Drums
Beatles | Stuart | Bass
Beatles | John | Guitar
Beatles | Pete | Drums
Stones | Mick | Vocals
Stones | Keith | Guitar
Stones | Charlie | Drums
Stones | Ronnie | Guitar
Stones | Mick | Vocals
Stones | Brian | Guitar
Stones | Bill | Bass
Stones | Mick | Guitar

Band name in Col A; Band Member in Col B; Instrument in Col C. Note that we have duplicates in both columns B and C, which I've bolded.

I've used Power Query to roll this stuff up nicely, with the result a single row per value in Column A. However, my current results looks like this:

BAND | MEMBER | INSTRUMENT
Beatles | John;Paul;George;Ringo;Stuart;John;Pete | Guitar;Bass;Guitar;Drums;Bass;Guitar;Drums
Stones | Mick;Keith;Charlie;Ronnie;Mick;Brian;Bill;Mick | Vocals;Guitar;Drums;Guitar;Vocals;Guitar;Bass;Guitar

Columns B and C contain multi-value field/cells, with individual values separated by semicolons. Again, duplicate values bolded.

The issue is the duplicates. It's important to note that my duplicates are within single, multi-value cells. So the question is how to use Power Query to do the following:
1) Clean those cells and remove any duplicative values in each individual cell.
2) Sort the individual values alpha-numerically

My goal is to get to this result:
BAND | NAME | INSTRUMENT
Beatles | George;John;Paul;Pete;Ringo;Stuart | Bass;Drums;Guitar
Stones | Bill;Brian;Charlie;Keith;Mick;Ronnie | Bass;Drums;Guitar;Vocals

Any input welcome.

Thanks!
 

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
    tbl = Table.Group(Source, {"BAND"}, {{"All", each _}}),
    tbl1 = Table.AddColumn(tbl, "NAME", each Text.Combine(List.Sort(List.Distinct([All][MEMBER])), ";")),
    tbl2 = Table.AddColumn(tbl1, "INSTRUMENT", each Text.Combine(List.Sort(List.Distinct([All][INSTRUMENT])), ";")),
    Result = Table.RemoveColumns(tbl2,{"All"})
in
    Result

Book1
ABCDEFGH
1Table4Query Output
2BANDMEMBERINSTRUMENTBANDNAMEINSTRUMENT
3BeatlesJohnGuitarBeatlesGeorge;John;Paul;Pete;Ringo;StuartBass;Drums;Guitar
4BeatlesPaulBassStonesBill;Brian;Charlie;Keith;Mick;RonnieBass;Drums;Guitar;Vocals
5BeatlesGeorgeGuitar
6BeatlesRingoDrums
7BeatlesStuartBass
8BeatlesJohnGuitar
9BeatlesPeteDrums
10StonesMickVocals
11StonesKeithGuitar
12StonesCharlieDrums
13StonesRonnieGuitar
14StonesMickVocals
15StonesBrianGuitar
16StonesBillBass
17StonesMickGuitar
18
Sheet4
 
Upvote 0
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
    tbl = Table.Group(Source, {"BAND"}, {{"All", each _}}),
    tbl1 = Table.AddColumn(tbl, "NAME", each Text.Combine(List.Sort(List.Distinct([All][MEMBER])), ";")),
    tbl2 = Table.AddColumn(tbl1, "INSTRUMENT", each Text.Combine(List.Sort(List.Distinct([All][INSTRUMENT])), ";")),
    Result = Table.RemoveColumns(tbl2,{"All"})
in
    Result

Book1
ABCDEFGH
1Table4Query Output
2BANDMEMBERINSTRUMENTBANDNAMEINSTRUMENT
3BeatlesJohnGuitarBeatlesGeorge;John;Paul;Pete;Ringo;StuartBass;Drums;Guitar
4BeatlesPaulBassStonesBill;Brian;Charlie;Keith;Mick;RonnieBass;Drums;Guitar;Vocals
5BeatlesGeorgeGuitar
6BeatlesRingoDrums
7BeatlesStuartBass
8BeatlesJohnGuitar
9BeatlesPeteDrums
10StonesMickVocals
11StonesKeithGuitar
12StonesCharlieDrums
13StonesRonnieGuitar
14StonesMickVocals
15StonesBrianGuitar
16StonesBillBass
17StonesMickGuitar
18
Sheet4
Thanks so much for this.

One more wrinkle: how would I deal with one more column that also needs to be collapsed, deduped, and sorted? Let's say it's called "Color".
 
Upvote 0
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
    tbl = Table.Group(Source, {"BAND"}, {{"All", each _}}),
    tbl1 = Table.AddColumn(tbl, "NAME", each Text.Combine(List.Sort(List.Distinct([All][MEMBER])), ";")),
    tbl2 = Table.AddColumn(tbl1, "INSTRUMENT", each Text.Combine(List.Sort(List.Distinct([All][INSTRUMENT])), ";")),
    Result = Table.RemoveColumns(tbl2,{"All"})
in
    Result

Book1
ABCDEFGH
1Table4Query Output
2BANDMEMBERINSTRUMENTBANDNAMEINSTRUMENT
3BeatlesJohnGuitarBeatlesGeorge;John;Paul;Pete;Ringo;StuartBass;Drums;Guitar
4BeatlesPaulBassStonesBill;Brian;Charlie;Keith;Mick;RonnieBass;Drums;Guitar;Vocals
5BeatlesGeorgeGuitar
6BeatlesRingoDrums
7BeatlesStuartBass
8BeatlesJohnGuitar
9BeatlesPeteDrums
10StonesMickVocals
11StonesKeithGuitar
12StonesCharlieDrums
13StonesRonnieGuitar
14StonesMickVocals
15StonesBrianGuitar
16StonesBillBass
17StonesMickGuitar
18
Sheet4
@JGordon11 ,

Thanks so much.

I've figured out how to run this across multiple fields/columns simultaneously. However, I'm seeing an issue with one of the fields I've used this on.

The example I provided was over-simplified and didn't reflect one wrinkle in one of my raw data fields: multi-value cells.

Here's a more representative sample, with multiple values in the "MEMBERS" column:

BAND | MEMBERS | INSTRUMENT
Beatles | McCartney, Paul; McCartney, Paul; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
Beatles | Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
Beatles | Lennon, John; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
Beatles | Lennon, John; McCartney, Paul; Lennon, John; Best, Pete; McCartney, Paul | Drums
Beatles | Lennon, John; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
Beatles | Lennon, John; McCartney, Paul; Harrison, George | Guitar
Beatles | Sutcliffe, Stuart; Lennon, John; Best, Pete | Drums
Stones | Jagger, Mick; Jagger, Mick; Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Vocals
Stones | Jagger, Mick; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
Stones | Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Drums
Stones | Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Vocals
Stones | Jagger, Mick; Richards, Keith; Jagger, Mick; Richards, Keith | Guitar
Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Bass
Stones | Watts, Charlie; Watts, Charlie; Watts, Charlie | Guitar

Notes:
  • Names are actually in Last, First format, separated by semicolons.
  • There is duplication not just across records, but within each value in the MEMBERS field.

And so I need a way to roll-up these records by the values in the BAND field, with a fully deduplicated (and alpha-numerically sorted) result. End result should look like this:

BAND | MEMBERS | INSTRUMENT
Beatles | Best, Pete; Harrison, George; Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, Stuart | Guitar; Bass; Drums
Stones | Jagger, Mick; Jones, Brian; Richards, Keith; Taylor, Mick; Watts, Charlie; Wood, Ronnie; Wyman, Bill | Bass; Drums; Guitar; Vocals

Thanks again.
 
Upvote 0
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tbl = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter(" | ", QuoteStyle.Csv), {"A".."C"}),
    tbl1 = Table.PromoteHeaders(tbl, [PromoteAllScalars=true]),
    tbl2 = Table.Group(tbl1, {"BAND"}, {{"All", each
            let    
                members = Text.Combine(List.Sort(List.Distinct(List.Combine(List.Transform([MEMBERS], (x)=> Text.Split(x, "; "))))),"; "),
                inst = Text.Combine(List.Sort(List.Distinct(List.Combine(List.Transform([INSTRUMENT], (x)=> Text.Split(x, "; "))))),"; ")
            in 
                [MEMBERS = members, INSTRUMENT = inst]

    }}),
    Result = Table.ExpandRecordColumn(tbl2, "All",{"MEMBERS","INSTRUMENT"})
in
    Result

Book1
ABCD
1Column1
2BAND | MEMBERS | INSTRUMENT
3Beatles | McCartney, Paul; McCartney, Paul; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
4Beatles | Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
5Beatles | Lennon, John; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
6Beatles | Lennon, John; McCartney, Paul; Lennon, John; Best, Pete; McCartney, Paul | Drums
7Beatles | Lennon, John; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
8Beatles | Lennon, John; McCartney, Paul; Harrison, George | Guitar
9Beatles | Sutcliffe, Stuart; Lennon, John; Best, Pete | Drums
10Stones | Jagger, Mick; Jagger, Mick; Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Vocals
11Stones | Jagger, Mick; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
12Stones | Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Drums
13Stones | Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
14Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Vocals
15Stones | Jagger, Mick; Richards, Keith; Jagger, Mick; Richards, Keith | Guitar
16Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Bass
17Stones | Watts, Charlie; Watts, Charlie; Watts, Charlie | Guitar
18
19BANDMEMBERSINSTRUMENT
20BeatlesBest, Pete; Harrison, George; Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, StuartBass; Drums; Guitar
21StonesJagger, Mick; Jones, Brian; Richards, Keith; Taylor, Mick ; Watts, Charlie; Wood, Ronnie; Wyman, BillBass; Drums; Guitar; Vocals
22
Sheet1
 
Upvote 1
Solution
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tbl = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter(" | ", QuoteStyle.Csv), {"A".."C"}),
    tbl1 = Table.PromoteHeaders(tbl, [PromoteAllScalars=true]),
    tbl2 = Table.Group(tbl1, {"BAND"}, {{"All", each
            let   
                members = Text.Combine(List.Sort(List.Distinct(List.Combine(List.Transform([MEMBERS], (x)=> Text.Split(x, "; "))))),"; "),
                inst = Text.Combine(List.Sort(List.Distinct(List.Combine(List.Transform([INSTRUMENT], (x)=> Text.Split(x, "; "))))),"; ")
            in
                [MEMBERS = members, INSTRUMENT = inst]

    }}),
    Result = Table.ExpandRecordColumn(tbl2, "All",{"MEMBERS","INSTRUMENT"})
in
    Result

Book1
ABCD
1Column1
2BAND | MEMBERS | INSTRUMENT
3Beatles | McCartney, Paul; McCartney, Paul; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
4Beatles | Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
5Beatles | Lennon, John; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Guitar
6Beatles | Lennon, John; McCartney, Paul; Lennon, John; Best, Pete; McCartney, Paul | Drums
7Beatles | Lennon, John; Harrison, George; Starr, Ringo; Sutcliffe, Stuart; Lennon, John; Best, Pete | Bass
8Beatles | Lennon, John; McCartney, Paul; Harrison, George | Guitar
9Beatles | Sutcliffe, Stuart; Lennon, John; Best, Pete | Drums
10Stones | Jagger, Mick; Jagger, Mick; Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Vocals
11Stones | Jagger, Mick; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
12Stones | Richards, Keith; Watts, Charlie; Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Drums
13Stones | Wood, Ronnie; Taylor, Mick ; Jones, Brian; Wyman, Bill; Jagger, Mick | Guitar
14Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Vocals
15Stones | Jagger, Mick; Richards, Keith; Jagger, Mick; Richards, Keith | Guitar
16Stones | Jagger, Mick; Richards, Keith; Watts, Charlie | Bass
17Stones | Watts, Charlie; Watts, Charlie; Watts, Charlie | Guitar
18
19BANDMEMBERSINSTRUMENT
20BeatlesBest, Pete; Harrison, George; Lennon, John; McCartney, Paul; Starr, Ringo; Sutcliffe, StuartBass; Drums; Guitar
21StonesJagger, Mick; Jones, Brian; Richards, Keith; Taylor, Mick ; Watts, Charlie; Wood, Ronnie; Wyman, BillBass; Drums; Guitar; Vocals
22
Sheet1
This is wonderful.
Many thanks, @JGordon11 !
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top