Viewer Solutions To Podcast 2316 Cleaning Data with Power Query - Episode 2323

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Feb 27, 2020.
Links to the 16 articles: Bill's "How Would You Clean This Data" Challenge
Back in episode 2316, I had a clunky solution to a data shaping problem from Beth. I knew there had to be a better way and I asked you for your solutions. I learned so much from the 27 solutions and many comments. Thanks to Peter Bartholomew, brandon brimberry, Jonathan Cooper, Bohdan Duda, Geert Demulle, Fowmy, ExcelisFun, Roger Govier, Wyn Hopkins, Josh Johnson, Michael Karpfen, Hussein Korish, Kevin Lehrbass, John MacDougall, Ondřej Malinský, Rene Martin, Chris McNeil, Jason M, Christian Neuberger, Ken Puls, Jamie Rogers, Prashanth Sambaraju, Oz Du Soleil, Rico S, Bill Szysx, Frank Tonsen, Charles Williams, MF Wong, and Matthew Wykle.
maxresdefault.jpg


Transcript of the video:
Learn Excel from MrExcel Podcast Episode 2323. The data cleansing solution to podcast 2316.
It was about two weeks ago that I put out video 2316 with an interesting data cleansing solution.
I gave my 10 minute solution.
I said there has to be a better way and invited people on that Sunday - it was 4:00 AM to send in solutions and I want to thank everyone who sent in solutions.
Overwhelmingly, Power Query was the method of choice and some formula solutions.
When I go back and re watch that original video there were a lot of things that I screwed up alright and so.
I went through every single solution, 27 solutions that came in and just an amazing variety of different approaches there and those solutions all became a series of 15 articles at MrExcel.com there's a link down in the YouTube description to go through?
If you want to learn new things about Power Query.
You know the first thing I ran into was that my repeated headings of Q1 Q2 Q3 Q4, were getting a suffix and Wyn Hopkins used Demote Headers.
MF Wong sending a video and Peter Bartholomew as well, where they use the table, but the unchecked my table has header so it created a fake set of headers above. That was beautiful.
And then Jason M, Andre and John all used Delete Promoted Headers, which is the method I'm going to use in my solution. Bohdan had this "How to extract before and after delimiter".
I thought that's beautiful and use a trick from Josh Johnson.
I'm not using, (you know it's funny, depending on which method you use), either you have certain problems or you don't.
A lot of people use an index column with modulo five MF Wong sent in a video with that, and while a lot of people did that MF Wong actually did very cool thing of doing the modulo in place rather than adding an extra column.
Now I'm not heading down this method, But go check that out in the article because it's very cool. Moving the names to a new column.
There were all sorts of formulas sent in and then two solutions without a formula first Fowmy.
Hey, nice job, but - it's not going to work in real data because you're going to people like Monique or Quinton.
But then Jonathan Cooper figured out a way to solve that.
You'll see that in my solution.
Filter to keep only rooms that have Q1 Q2 Q3 Q4: a wild solution completely unlike everyone else.
Chris McNeil pivoted quarters, fills up.
And then keeps only the Rows I deleted, and then Michael Karpfen.
He says, "why delete all those total rows and add them back later as a total column?", Check out his solution in the article.
To get rid of the Department total section: MF Wong and Oz du Soleil just delete top five rows.
Keeping the category sorted: Solutions by Josh Johnson and John McDougall.
But it turns out that in the solution I have it doesn't get screwed up.
Almost everyone universally knew that I didn't have to do equal Q1 plus Q2 Plus Q3 Plus Q4.
Yeah, there's a great way to use Add Column, Standard, Add.
A lot of people said don't pivot the data at all.
Just do it in a pivot table.
Ken Puls sent in a solution for that. OK, and then shout out here.
Everything above is done in the power query interface, but two people, Frank Tonsen and Bill Szysz, sent in things where they actually hand coded the M, right and Bill's solutions are mind-blowing. I don't understand them yet.
Give me another five years and I'm going to get there right?
So check those out in the article and then I love this.
Some people said no, we're not going to use power query.
Hey, Rico S - who's been on the podcast before - single dynamic array formula. Check that out.
Kevin, Peter, Hussein, Prashanth Renee Martin all sent in solutions and then Roger Govier, who has been here a lot.
He sent in solution So all of those are detailed here. So this is me.
After looking at all 27 entries and the various techniques I want to show you basically what I've learned.
This is the composite solution using ideas from everyone.
I'm still going with the named range, you know. I like Custom Views. I like Subtotals.
I still reject tables because those don't work with tables.
So From a Table or Range.
All right, and you see I didn't notice it in the first video, but I was just in trouble from the get-go because of the underscore 1 2 3 4 and so on. Thanks to Jason M., Ondřej Malinský, and Peter Bartholomew said no.
Just get rid of these two and that problem goes away right?
Which is just solves so many issues down the line. So thanks for that.
Alright, then Transform, Transpose the table alright.
So now we have names and quarters going down and the headings going across.
Now at this point will do Home, Use first row as headers. The top section is the Department total.
We don't need that unless you're Michael Karpfen, who had a great solution.
And rather than trying to deal with this later, boy, I just love this.
Remove rows, Remove top rows. How many rows we want to get rid of? Five.
thanks. Thanks to MF Wong for your great video.
By the way, I watched your video without the English subtitles on.
I could just tell from the emotion in your voice exactly what was happening -it was beautiful.
Right now we're at the point where we have to isolate the quarters versus the names, and there were a lot of different formula setting for this.
But the method that I like best started out with Fowmy who said what we're going to do is we're going to take this.
Split the column by delimiter and the delimiter is going to be...
Q.
Click OK.
And it isn't that gorgeous now have names in Column A and quarters in column B.
Except for in real life, we're going to have someone like Quentin. And that is going to cause a problem. Alright?
So in the data set that I gave you for me absolutely that worked. Kudos for that.
But a similar idea came in from Jonathan Cooper, an I realized these next four steps are a little bit annoying, but the method is just so cool to me. It's worth it.
Alright, so right here in category description I'm going to replace values.
And I'm going to find Q1 and replace it with _Q1.
OK. I want to do that three more times. I'll speed up the video.
Alright, there you go. Welcome back.
See now I have underscores before everything, and Quenton doesn't have an underscore before him so.
Choose the Split Column.
By delimiter By an underscore, they figured it out.
Wow this is beautiful because we automatically split the employees and the quarters into two separate columns without a formula at all.
I love split column but I hate the headings that they give us and I always complain about this and have to do 2 rename steps.
But in a video sent in by Josh Johnson I saw this awesome tricks.
I'll get rid of change type and then run up here in the formula bar.
If you can't see the formula bar go to View, Formula Bar.
In the formula bar you see the headings that they assign so we can just click up here in the formula bar. And the first column is going to be called.
Employees or Employee and then the second column will be called Quarter.
Just click away and Bam! We get those. How cool is that?
Alright now we're going to use fill down here, but before we can use filled down, we have to replace those blank cells with nulls. So replace values. Replace nothing with null.
Click OK.
And then Transform, Fill, Fill Down.
Almost universally, everyone used that to get rid of the total rows.
Go to the quarter and take out null.
Click OK.
How beautiful is that will choose these two columns and say unpivot the other columns.
Now, in my original video these got out of sequence somewhere along the way, but with unpivot columns here, they are going to stay in the correct sequence.
That's beautiful.
Alright, and they're calling it attribute and value that should be Category, Description.
Or let's just call it Category.
Click away.
Alright, good were almost home were gonna take the quarter.
Pivot the column.
The values column is going to be Value.
Click OK.
The order stays the same.
We have our employees, including employees whose name is Q. OK, now, here - this step.
This is a step that everyone knew that there was a better way to do this. I need to total these four.
Add Columns, Out here under Standard is something called Add.
And other than the fact that they call it "Addition" which I don't like.
But I just learned this but I'm going to use it all the time now.
How to rename a column on the fly?
There's my sume and then Home, Close and Load.
And our solution.
Alright yeah hey just an alternate solution, Michael Karpfen noticed that we were all getting rid of the totals and then adding the totals back in. He said like why are you doing all of that?
So we'll keep everything from filled down but then get rid of the others here.
And from filled down here, right click. Replace values.
Null. With the word total, click OK.
Beautiful select these two.
Unpivot other columns.
Select quarter. Pivot that column.
And then simply take total and move move to end.
And we're home free.
I learned so many new things along the way in watching the 27 solutions that were sent in.
I really want to thank everyone for doing that. Now, look, hey, my solution here completely used.
The power query interface.
But you need to go check out this article with the solutions in it by. Bill Szysz.
It is just mind blowing written by hand using M my favorite, which he said he would never use as a professional solution, but it was just really kind of showing off the power of M.
Bill in four lines of code manages to solve the entire problem? Writing things out in the M language.
You know this is kind of like if you think about the VBA for all people just using the macro recorder never touched the code really in Power Query.
That's what we're doing, except Power Query is so much better than the VBA macro recorder we can get a long way.
But for those people who have moved beyond the macro recorder to real VBA, you can do all kinds of amazing things and a tip of the cap to Bill Szysz for this awesome solution.
Again, go check out this article or you can copy this code into your workbook. Well I say it every time, but I have to tell you.
Everyone who sent in a video or sent me to your channel, I made sure to subscribe to you.
Thanks for all these awesome ideas.
I love explaining things in the YouTube channel but I love learning from you as well.
And this was a great example of where I learned a lot of things.
Be sure to check out all those articles at MrExcel.
There were a lot of great ideas that send in that got something that weren't in my in my path.
My recommendation is always check out the book.
M is for (Data) Monkey From Ken Puls and Miguel Escobar. That's what got me up to speed with power query.
Well look.
I want to thank everyone who sent in solutions I want to thank YOU for stopping by.
See you next time for another netcast from MrExcel.
 

Forum statistics

Threads
1,221,551
Messages
6,160,460
Members
451,648
Latest member
SuziMacca

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top