Excel Power Query Import And Clean Fixed Width Text Files 2539

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Jan 23, 2023.
I was doing a seminar recently and many people at the company were getting reports downloaded as fixed width text files.

I encouraged them to start using the Get & Transform tools in Excel for cleaning this date. After the seminar, I created a text file with many of the issues that were in their workbooks.

Table of Contents
(0:00) Fixed Width Text File for Excel
(0:50) Excel Text Import Wizard UI for Marking Column Locations
(1:19) Finding column start locations for Power Query
(1:59) Importing to Excel from Text/CSV
(2:18) Power Query incorrectly guesses comma as delimiter
(2:42) Power Query make column wider
(2:55) Power Query removing top 6 rows
(3:30) Power Query Split by Position
(4:20) Power Query Use First Row as Headers
(4:30) Deleting non-data rows
(6:00) Converting blanks to null and Fill Down
(7:00) Marking date columns in Power Query
(7:46) Refreshing Power Query
(8:45) Inserting blank row at each change in employee
(11:17) Wrap-up
maxresdefault.jpg


Transcript of the video:
This is going to save you so much time, it should be illegal to know this trick.
Learn Excel from MrExcel Podcast, Episode 2359, Importing Fixed Width Text Files.
Yeah, I saw a file just like this file last week.
It was 1,000 pages that were basically printed to a text file. So they had this report .text, and when we look at it in Notepad, you can see that this is just some old, ancient system, writing things out.
We get the headings every so many rows. Look at this, 1,208 pages of data.
And there's all kinds of problems in this data, like a blank row after every single record.
In this one, Marjorie appears on the first row, but not on rows two, three, four.
More blank rows between Marjorie and Jasmine. This is just an ugly data set.
And cleaning this in Excel would probably take an hour or more.
The first thing is, when you go and open this in Excel, and we have fixed width...
I'm actually going to give kudos to Excel because, in step two, being able to draw the lines where they belong is easier than what we're going to do in this video.
So the first step is going to be harder using the Get and Transform tools.
But after that, everything is going to be dramatically easier.
So we have to figure out, for these nine columns going across, where they start.
And, for me, that was open it in Notepad++, click where the start of the cell is, and then look down here in the bottom. And see that's column 12.
And the ID number is in column 47. And the reason is in column 58.
All right, so I just created a nice little list of where each field starts.
One issue with the Get and Transform tools is that they're zero base.
So I needed a simpler formula here to subtract one.
And then a TEXTJOIN to join all of those numbers with a comma space in between.
Just to get that information, I'm going to go out to Notepad, and paste that in Notepad, because I won't be able to get back to Excel while I'm cleaning this data.
All right, so here we go. We have our report, we want to try and clean it.
Let's see how long this takes.
First thing, instead of opening the file, we're going to come to the Data tab. On the Get and Transform data, say From Text/CSV.
Browse through the folder that has all of your files, choose the file, click Import.
Power Query looks through this data and it detected some commas here in row three, the parameters row, and that's not how this data is separated.
So we need to change to something that's not in the file, for example, a tab.
I want all of this data to land in one single column for right now.
All right, so that's the first gotcha.
Next up, Transform Data, and the column's not wide enough for us to see everything.
Drag it as far across as you can, but you still can't see state and zip code. But that's okay, we'll be able to fix that later.
And you see that your headings are appearing in row seven.
All right.
So that means that the first six rows are junk, and we're going to get rid of those rows, so we can move the headings up to the first row.
So here under Home, Remove Rows, Remove Top Rows, and it asks how many rows to remove.
And we say that we want to remove six. Like that.
So now, we have our fields here and we need to split these fields in Excel.
That would be Data, Text to Columns. Here in Power Query, it's Home.
And then split column by positions. Positions.
All right, so we choose positions, and this is where they want to see what the positions are.
Now, it's frustrating.
There's a tool tip there that only appears after you click Advanced Options.
And that tool tip is very helpful because it shows you that you start at character 0.
Remember, what we would call character 1, Power Query calls character 0, and then character 2 and 5 and 10 and 14 with commas in between.
So we're going to come back to Notepad and save this little set of column numbers that I copied out of Excel, Ctrl+C, and Paste, click Okay. All right, now check this out.
Look at this.
So now we have our headings in row one and the data starts in row two.
All right, we're going to take those headings and move them up to be the column name. So use first row as headers.
Beautiful. All right.
Now, we need to get rid of all the blank rows.
And one of the fields that would be great for this, like if the name is blank, that doesn't mean anything because see, all of these are data records, but the reason code and the date seem to be filled in all of the time. And let's just come down here to page two.
You'll see in the date field, Fair Home Scarecrows, amazing system report and parameter.
So anytime that we have FAI or a maze or a paramet...
And, of course, in your data, who knows what kind of junk is up here in these title rows.
And also the word date has been moved up to the header.
So anytime we see the word date here, we can get rid of that.
All right, so we're going to open this filter and get rid of the first part of Fair Home, anything that says amazing, anything that says parameter.
All the way down to the bottom, we're going to get rid of the word date, and then we're also going to get rid of any blank or that's spaces.
So a bunch of spaces there. And click Okay.
All right. Now that was useful.
All right?
That got rid of all of those blank rows all the way through the data.
All right, and now we have this situation where the name appears once and this name applies to all of these blank cells.
And, of course, in Excel it would be very tedious, but you could grab the fill handle and drag.
Power Query has a much easier way to do that.
Unfortunately, because of the way that this file was created, this cell is not blank. It's filled with 25 spaces.
All right, so what we have to do is we have to choose all of these columns that have the blanks, and we're going to do two things to clean those.
The first thing we're going to do is transform, format, and trim.
Trim gets rid of all the leading and trailing spaces in those columns, and then we're going to replace values.
And the value to find, we're going to leave that blank.
But in the replace width, we're going to say null. Null is a weird word.
It's a value in Power Query that says that this cell is completely empty.
See?
And now that we've done those two steps, which if you've never been in Power Query before, you're going to say, well, those seem kind of silly.
But those two steps allow us to do this amazing thing.
I'm about to fill in all of the blanks in all six columns with the value from above.
This is on the Transform tab. It's called Fill, Fill Down.
And right there, that is an amazing step.
All right.
Now, one additional thing to do here in Power Query, these little icons here that mean that they're going to treat this column as text, and you see we have a date field over here that's being imported as text, and we don't want that.
So choose the date column on the Transform tab. Data type, we're going to change that to a date.
And then this ID number might have leading zeros.
I could see where you could leave that as text, but this sequence number definitely appears that it should be a whole number.
All right?
So just take a look through your columns, and if anything shouldn't be text, clean that up and fix it.
All right, so now we're done. We can come here in Home, close and load.
It's going to insert a new sheet to the left of our worksheet.
The data's brought in as a table.
Now the amazing part of using these tools on the Data tab, the Get and Transform data tools, is that it's actually created a query.
And over here on the far right-hand side, you can see it, but on your computer, the queries and connections pane may not open. So on the Data tab, go to Queries and Connections.
And I really think this is kind of buggy.
At first, it opens kind of like this, and you can't see the Refresh icon.
You just have to drag it out to be wider the first time so you can see that Refresh icon.
The amazing thing is that the next time that you get a workbook from the system, that text file, just put it in the same folder with the same name and then open this workbook.
And when you click Refresh, it will go out and clean all of the data in seconds now, instead of hours.
For those of you who found this on YouTube and you're trying to figure out how to import fixed width text files, we're done.
But for the person that I met last week who actually had this workbook, the way that they produce this file is they create a blank row between each set of employees, and that seemed really important to get that blank row.
He was trying to delete thousands of blank rows, but wanted to leave that one blank row there.
I actually got the sense that if he couldn't have the blank row between each employee, he was going to keep doing it in Excel his way, which was taking an hour each time.
I don't know why they need the blank row.
I'm just going to accept that they need the blank row.
I don't need you to argue with me that blank rows between this data are a horribly bad idea.
We're just trying to get Power Query adoption here, and if I can't figure out a way to get the blank row in, they're going to keep doing it in Excel.
So I'm going to insert a new worksheet to the left of this. This is a table.
This is a refreshable query. I want to keep this exactly the way it is.
I don't want to screw it up. I'm going to take all of this data.
I'm going to copy it, Ctrl+C, come here to Sheet 3, and I will Home, Paste as Values.
Now, that's going to cause the date column to not be formatted, so I'm going to fix that as a short date, maybe do a format column width. All right.
Now the goal here is that we want to insert a blank row every time that the employee name changes.
I will say no in that row, and then equal if this name is equal to the previous name, then no, otherwise one. Double click to copy that down.
And what that'll do is that'll put a 1 on the first row of each employee.
All right, now we're going to go to Home, Find and Select, Go to Special.
We're going to say for each cell that's a formula, that is a number, we're going to select just those cells. Like that.
And then Insert, Insert Sheet Rows.
It'll spin the blue circle there, because we're inserting thousands of rows. There we are.
And then delete that column J.
That's a painful set of steps, but it's not as painful as manually deleting thousands of other rows leaving the one that's left.
Maybe someone who watches this will say that there's a way to do that in Power Query.
So pretty typical, but pretty ugly data file coming in.
Power Query lets us clean that in six minutes the first time, and in less than a minute, every time after that. Hey, I want to thank you for stopping by.
We'll see you next time for another netcast from MrExcel.
 

Forum statistics

Threads
1,223,627
Messages
6,173,424
Members
452,515
Latest member
Alicedonald9

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top