Excel 2024: Clean Data with Power Query


November 14, 2024 - by

Excel 2024: Clean Data with Power Query

Power Query is built in to Windows and Mac versions of Microsoft 365, Excel 2016, Excel 2019, Excel 2021 and is available as a free download in Windows versions of Excel 2010 and Excel 2013. The tool is designed to extract, transform, and load data into Excel from a variety of sources. The best part: Power Query remembers your steps and will play them back when you want to refresh the data. This means you can clean data on Day 1 in 80% of the normal time, and you can clean data on Days 2 through 400 by simply clicking Refresh.

I say this about a lot of new Excel features, but this really is the best feature to hit Excel in 20 years.

I tell a story in my live seminars about how Power Query was invented as a crutch for SQL Server Analysis Services customers who were forced to use Excel in order to access Power Pivot. But Power Query kept getting better, and every person using Excel should be taking the time to learn Power Query.


Aha!

Aha!: Power Query finally came to the Mac in 2023, but it is missing several important connectors. Suat Ozgur, the database administrator at MrExcel.com created those connectors and provides them free to the world. Get details at Episode 2597.

Get Power Query

You may already have Power Query. It is in the Get & Transform group on the Data tab.

Note

The "From Selection" icon above has been renamed to announce that you can use the results of an Array formula as the source for Power Query. Previously, this icon said "From Table/Range". It was briefly changed to "From Sheet" and then changed to "From Selection".

But if you are in Excel 2010 or Excel 2013, go to the Internet and search for Download Power Query. Your Power Query commands will appear on a dedicated Power Query tab in the Ribbon.

Clean Data the First Time in Power Query

To give you an example of some of the awesomeness of Power Query, say that you get the file shown below every day. Column A is not filled in. Quarters are going across instead of down the page.

To start, save that workbook to your hard drive. Put it in a predictable place with a name that you will use for that file every day.

Products are in column A, but in an outline view where Applie appears in A2, followed by several blank cells. Banana is in A8 followed by more blank cells. Customers are in column B. Quarters are going across the worksheet in C, D, E, and F.
Products are in column A, but in an outline view where Applie appears in A2, followed by several blank cells. Banana is in A8 followed by more blank cells. Customers are in column B. Quarters are going across the worksheet in C, D, E, and F.


In Excel, select Get Data, From File, From Workbook.

Browse to the workbook. In the Preview pane, click on Sheet1. Instead of clicking Load, click Transform Data. You now see the workbook in a slightly different grid - the Power Query grid.

Now you need to fix all the blank cells in column A. If you were to do this in the Excel user interface, the unwieldy command sequence is Home, Find & Select, Go To Special, Blanks, Equals, Up Arrow, Ctrl+Enter.

The blank cells in column A now say "null" in the Power Query Editor.
The blank cells in column A now say "null" in the Power Query Editor.

In Power Query, select Transform, Fill, Down.

Choose column A. Open the Fill drop-down menu and choose FIll, Down.
Choose column A. Open the Fill drop-down menu and choose FIll, Down.

All of the null values are replaced with the value from above. With Power Query, it takes three clicks instead of seven.

Next problem: The quarters are going across instead of down. In Excel, you can fix this with a Multiple Consolidation Range pivot table. This requires 12 steps and 23+ clicks.

In Power Query select the two columns that are not quarters. Open the Unpivot Columns dropdown on the Transform tab and choose Unpivot Other Columns, as shown below.

Select columns A and B in Power Query. On the Ribbon, choose Unpivot other columns.
Select columns A and B in Power Query. On the Ribbon, choose Unpivot other columns.

Right-click on the newly created Attribute column and rename it Quarter instead of Attribute. Twenty-plus clicks in Excel becomes five clicks in Power Query.

You have four times as many rows. Columns A & B appear the same (except there are four rows for each previous one row). The Quarters that were going across columns C, D, E, and F now go down column C. The revenue from the data set is now in column D.

Now, to be fair, not every cleaning step is shorter in Power Query than in Excel. Removing a column still means right-clicking a column and choosing Remove Column. But to be honest, the story here is not about the time savings on Day 1.

When you are done cleaning the data, click Close & Load as shown below.

On the Home tab in Power Query, choose Close & Load.
In a few seconds, your transformed data appears in Excel. Awesome.

The transformed data is returned to a table in Excel.
The transformed data is returned to a table in Excel.

But Wait: Power Query Remembers All of Your Steps

Look on the right side of the Power Query window. There is a list called Applied Steps. It is an instant audit trail of all of your steps. Click any gear icon to change your choices in that step and have the changes cascade through the future steps. Click on any step for a view of how the data looked before that step.

On the right side of the Power Query Editor, a list of Applied Steps. For this example, you have Source, Navigation, Promoted Headers, Changed Type, Filled Down, Unpivoted Other Columsn, Renamed Columns.
On the right side of the Power Query Editor, a list of Applied Steps. For this example, you have Source, Navigation, Promoted Headers, Changed Type, Filled Down, Unpivoted Other Columsn, Renamed Columns.

The Payoff: Clean Data Tomorrow With One Click

But again, the Power Query story is not about the time savings on Day 1. When you select the data returned by Power Query, a Queries & Connections panel appears on the right side of Excel, and on it is a Refresh button. (We need an Edit button here, but because there isn't one, you have to right-click the original query to view or make changes to the original query).

The Queries & Connections panel lists one query called Sheet1 with 68 rows loaded. If you make the panel wider and hover over Sheet1, a refresh icon appears.

It is fun to clean data on Day 1. I love doing something new. But when my manager sees the resulting report and says Beautiful. Can you do this every day?I quickly grow to hate the tedium of cleaning the same data set every day.

So, to demonstrate Day 400 of cleaning the data, I have completely changed the original file. New products, new customers, smaller numbers, more rows, as shown here. I save this new version of the file in the same path and with the same filename as the original file.

If I open the query workbook and click Refresh, in a few seconds, Power Query reports 92 rows instead of 68 rows.

Click the Refresh icon in the Queries & Connections panel and it reports you have 92 rows loaded.

Change some data in the original worksheet. Add more rows. Type new customers. Change numbers.
Change some data in the original worksheet. Add more rows. Type new customers. Change numbers.

Cleaning the data on Day 2, Day 3, Day, 4,...Day 400,...Day Infinity now takes two clicks.

The new customers appear at the bottom of the table in Excel.
The new customers appear at the bottom of the table in Excel.

This one example only scratches the surface of Power Query. I have several more YouTube videos in a playlist at https://mrx.cl/pqplaylist. You will learn about other features, such as these:

  • Combining all Excel or CSV files from a folder into a single Excel grid
  • Converting a cell with Apple;Banana;Cherry;Dill;Eggplant to five rows in Excel
  • Doing a VLOOKUP to a lookup workbook as you are bringing data into Power Query
  • Making a single query into a function that can be applied to every row in Excel

Bonus Tip: Tame the Sequence of Refresh All

Say that you have a Power Query and then build a pivot table on the results. When you go to the Data Tab and click Refresh All, you need the Power Query refresh to complete before the pivot table refreshes. Excel MVP Celia Alves from SolveAndExcel.ca offers this important tip: Right-click the query in the Queries & Connections pane and choose Properties. Make sure to uncheck the Enable Background Refresh setting. This forces the Power Query to finish refreshing before the pivot table is updated. Otherwise, you will have to click Refresh All twice in order to get the pivot table to update.

Bonus Tip: Data Profiling in Power Query

Some excellent data profiling options appeared in late 2019. They are not obvious to find, as they are on the View tab in Power Query. Use the Column Quality, Column Distribution, and Column Profile boxes.

The View tab in Power Query now offers a Data Preview group with five checkboxes:  Monospaced, Show Whitespace, Column Quality, Column Distribution, and Column Profile.
The View tab in Power Query now offers a Data Preview group with five checkboxes: Monospaced, Show Whitespace, Column Quality, Column Distribution, and Column Profile.

My favorite is the Column Profile. As you would expect, Weekday has 7 distinct values.

Each column has a column chart at the top. For Weekday, the column chart shows 7 distinct values, none have a single value. Two days appear more frequently than the other days. For the Date column, there are 457 distinct and 359 unique.
Each column has a column chart at the top. For Weekday, the column chart shows 7 distinct values, none have a single value. Two days appear more frequently than the other days. For the Date column, there are 457 distinct and 359 unique.

Choose one column and then Column Profile. You will see a window at the bottom with statistics and a frequency distribution.

The Column Profile window has statistics on the left; Count, Error, Empty, Distinct, Unique, Empty Tring, Min, and Max. On the right is a histogram showing the values that occur most frequently.
The Column Profile window has statistics on the left; Count, Error, Empty, Distinct, Unique, Empty Tring, Min, and Max. On the right is a histogram showing the values that occur most frequently.

Column Quality shows if you have any empty or error cells in each column:

The Column Quality shows the percentage of values in that column that are Valid, Error, or Empty.
The Column Quality shows the percentage of values in that column that are Valid, Error, or Empty.

For a complete description of Power Query, check out Master Your Data by Ken Puls and Miguel Escobar.

Cover for the book Maste Your Data With Excel and Power BI - Leveraging Power Query to Get & Transform Your Task Flow.

Thanks to Miguel Escobar, Rob Garcia, Mike Girvin, Ray Hauser, and Colin Michael for nominating Power Query.


This article is an excerpt from MrExcel 2024 Igniting Excel

Title photo by Milos Lopusina on Unsplash