Clean Data with Power Query
September 28, 2017 - by Bill Jelen
Power Query is a new tool from Microsoft to Extract, Transform, and Load data. Today's article is about processing all files in a folder.
Power Query is built in to Excel 2016 and is available as a free download in certain versions of Excel 2010 and Excel 2013. The tool is designed to extract, transform and load data into Excel from a variety of sources. The best part: Power Query remembers your steps and will play them back when you want to refresh the data. As this book goes to press, the Power Query features in Excel 2016 are on the Data tab, in the Get & Transform group, under New Query. It is hard to predict whether Microsoft will retroactively rename Power Query to Get & Transform in Excel 2010 and Excel 2013.
This free add-in is so amazing, there could be a whole book about it. But as one of my top 40 tips, I want to cover something very simple: bringing a list of files into Excel, along with the file creation date and maybe size. This is useful for creating a list of budget workbooks or a list of photos.
In Excel 2016, you select Data, New Query, From File, From Folder. In earlier Excel versions, use Power Query, From File, From Folder. Specify the folder:
While editing the query, right-click any columns that you don’t want and choose Remove.
To get File Size, click this icon in the Attributes column:
A list of extra attributes appears. Choose Size.
A large list of Transform options are available.
hen you are done editing the query, click Close & Load.
The data loads to Excel as a table.
Later, to update the table, select Data, Refresh All. Excel remembers all the steps and updates the table with a current list of files in the folder.
For a complete description of the feature formerly known as Power Query, check out M is for (Data) Monkey by Ken Puls and Miguel Escobar.
Thanks to Miguel Escobar, Rob Garcia, Mike Girvin, Ray Hauser and Colin Michael for nominating Power Query.
Watch Video
- The Power Query tools are on the Data tab in Excel 2016
- Free add-in for 2010 and 2013
- List all files from a folder into the Excel grid using Power Query
- Choose New Query, From File, From Folder
- Not obvious: expand the attribute field to get size
- If your data is in CSV files, you can import all of the files at once into a single grid
- Promote the heading row
- Delete the remaining header rows
- Replace "" with null
- Fill down for the outline view
- Delete the grand total column
- Unpivot the data
- Formula to convert month names into dates
- Complete list of steps - world's greatest Undo
- Next day - refresh the query to re-do all of the steps
Transcript of the Video
- Power Query is built-in to Windows versions of Excel 2016. Look on the Data tab in the Get & Transform group. If you have 2010 or
- 2013 as long as you're running Windows
- and not Mac everything that's here in Get & Transform
- you can download for free from Microsoft. Just search for
- Download Power Query.
- Today, I am interested in using Power Query to get a file list. I
- want to list all the files in a folder.
- Maybe I need to see which files are the
- large files or I need to sort or I need
- you know to get a combination of you
- know the budget files that we sent out
- and then a different folder which ones
- we came came back.
- To start, go to Data, Get & Tranform, From File, From Folder.
- Paste in the folder path or use the Browse button.
- Click OK and they show me this
- preview. Choose Edit.
- A couple of things here you see we have
- the file name the extension the date
- accessed, date modified, date created.
- It's really not obvious that this symbol next to the Attributes heading means Expand. Click that symbol and there's more stuff in
- here and if you click this symbol then I
- can go in and get things like file size
- or if it's read-only and things like
- that so in this case I just want file
- size. Choose File Size. Click ok. They give you a new field with a name of Attributes.Size.
- I can see how many bytes are in
- each file.
- Maybe I don't need everything here maybe
- I don't need the date created so I can
- right click and say that I want to
- remove that column. This
- binary I don't need that will remove
- that column. From the Ribbon, click Close & Load.
- In a few seconds, you will have a sortable view of
- everything in that folder if the folder
- changes I can come in here and I can
- refresh the query and it will go back
- out and pull that data in right this is
- for me this is a problem that we used to
- have all the time we would send out 200
- budget files
- and you get someone back not all them
- back you need to be able to compare so
- now I can essentially do a vlookup
- between folders.
- It is just amazing how
- cool it is but look let's go beyond
- what I have in the book and show you how
- that's just the tip of the iceberg.
- I'm going to create another query. Data, New Query, From File, From Folder.
- I'll copy that folder path here.
- click edit.
- As of October 2016, this trick only works with CSV
- files, but in 2017 it was updated to work with single-sheet Excel files. I
- have a folder a whole bunch of files and
- I want to create one excel grid with all
- of the data from all of these files.
- It's not intuitive at all. Look next to the heading for the Binary column. There is an icon with two arrows pointing down at a horizontal line.
- Click that.
- BAM! it just pulled in every single record from
- every single file in that folder!
- Isn't
- that amazing I mean that was a VBA macro
- before and it takes months to learn VBA
- macros you can learn power query in ten
- minutes.
- We have to select this column and
- go to replace values say that we're
- going to
- replace nothing with the word null click
- okay
- That'll give us Nulls in place of empty cells.
- Those nulls allow us to use this amazing
- featured called Fill Down. Watch that
- column when I choose Fill Down. BAM it
- just pulled in all of that outline view
- and brought the value down.
- I don't need the Grand Total column.
- Right-click and remove.
- Now at this point you say oh yeah hey we could
- pull this in and it'd be awesome. But if
- we wanted to create a pivot table from
- this data having a repeating group going
- across Jan Feb Mar is not a good format
- for a pivot tables.
- Right now we have 47
- rows I need to have 47 times twelve rows
- and to do this in a regular Excel file
- it is horrendous using a Multiple
- Consolidation Range that I learned from
- Mike Alexander at Data Pig Technologies.
- But it is easy in Power Query. Check this out I'm going to choose the
- label columns along the left. These are the things that I don't
- want to change and then on the Transform tab, choose Unpivot Other Columns.
- We go from 47 rows to 564 rows
- that's an amazing step.
- Here you can see
- that these values are text. It is easy enough to
- change it to either currency or a whole
- number. Right click the heading and choose Rename and call it
- revenue
- How about these months? They're
- all text such as Jan, Feb, Mar. Here's an awesome way to fix
- that we go to add column add a custom
- column doesn't matter what the name you use.
- The calculation, in quotes, is " 1, 2016". Click OK.
- Now we have this new custom column I'm going to take the
- attribute column containing Month names and the new custom column. Select both columns
- and say I want to merge those columns
- with a space in between and call it date.
- Click OK. That looks enough
- like a real date that when I go to
- transform and change it to a date it
- converts it to a true Excel date.
- At this point these two temporary
- columns I can right click and remove.
- Now you could be saying to yourself:
- Wait, Bill! we could have done all of this in Excel and that's absolutely true we could have done all of it in Excel it would have been harder to get all the CSV files into one file it would have taken longer to fill in the blanks it definitely would have taken longer to do the unpivot operation but here's the thing look over on the right-hand side we haven't talked about Applied Steps at all. The Applied Steps panel is like the world's greatest undo if you need an audit trail if the auditors come and say well how did you get from all these CSV files to this file that we're building our financial statements on you can go back and show what it looked like at each step along the way. If you screwed something up back here you could change or edit that step. Next, on the Power Query Home tab, choose Close and Load. So here's our data set this is based on all of the files in this folder. Let's build a little pivot table from here insert pivot table existing worksheet right here and I'll put revenue in the values area products down the left hand side you see that we have six million in revenue. Well that's today's data now tomorrow tomorrow let's say that we get a couple of new customers a couple of new files come along and our IT department takes those and dumps them into our folder all I have to do is reopen this file select the query come over here and refresh and then come here analyze refresh the pivot table and we have the new data! Power Query is faster on day one maybe by a factor of 20-30%. On day two is faster by 99% it's an absolutely amazing product Power Query. It's in Excel 2016 but if you're in 2010 or 2013 for Windows you're more than welcome to go out and download it for free. The book that will teach you about Power Query is M is for (Data) Monkey" by Ken Puls and Miguel Escobar.
- This book will teach
- you everything about the power query
- interface it's an amazing book the best
- book on power query everything I learned
- I learned from this book. I got on a flight from
- Orlando to Dallas - I read the whole book
- and my knowledge of power query just
- soared in two hours you can be up to
- speed and replace things that you would
- have had a used to have done with VBA.
Download File
Download the sample file here: Podcast2037.xlsx
Title Photo: jarmoluk / Pixabay