Microsoft Excel Tutorial: Using Python in Excel to create Excel-like pivot tables.
I love pivot tables in Excel. In fact, I've written an entire book on Pivot Tables. So when I saw that Python has a function to generate "Excel-like" Pivot Tables in a new data frame, I wanted to try it out.
The Python Pivot Table is missing a few things:
1. Row Fields are called Index
2. Defaults to Average instead of Sum
3. Empty cells show as errors. Use fill_value
4. No Grand Totals by default! Turn on with margins=True
5. When you add Grand Totals, they are called "All" Unless you change them with Margins_Name
6. When you group by dates, you can't have grand totals
7. Does not sort by Custom Lists
8. Odd arrangement of headings when 2 row fields
But Python pivot tables have some advantages over Excel pivot tables:
Advantages over Excel
1. Automatic Recalc without Refresh
2. Date Grouping Offers some amazing options
In this episode, using Python in Excel to build data frames that look like pivot tables. You will see:
• Basic pivot table
• Adding Grand Totals
• Filling empty cells with zero
• Sorting or not sorting
• multiple row fields, column fields
• multiple value fields
• Sum of one field, mean of another
• Grouping dates by Month, Week, Quarter, Semi-Monthly, 3 days, 14 days, 2 weeks
• Crazy Excel formulas to reformat the top 2 rows of the pivot table
Table of Contents
(0:00) Podcast Words of the Day: Craw
(0:32) Syntax of Python Pivot_Table
(1:06) Contrast Python and Excel Pivot Tables
(1:58) Python better than Excel pivot tables
(2:18) Source data
(2:33) First Python Pivot Table
(2:44) #NUM! errors with Fill_Value
(3:03) Change Average to Sum
(3:30) Why blank 2nd row
(3:49) Adding Grand Totals with Margins=True
(4:35) Sorting a python pivot table in Excel
(5:07) Adding second row field
(5:50) Second column field
(6:23) Adding second Values field
(6:48) Sum Revenue and Average Profit
(7:10) Grouping Dates by Month
(8:20) Group by Weeks & More
(9:44) Blank Row 2
(10:19) Fixing the Python Pivot Table Formatting
(10:57) Can't combine Python formula with Excel formula
(11:21) Drunk kid on Christmas
(12:15) Wrap-up
I love pivot tables in Excel. In fact, I've written an entire book on Pivot Tables. So when I saw that Python has a function to generate "Excel-like" Pivot Tables in a new data frame, I wanted to try it out.
The Python Pivot Table is missing a few things:
1. Row Fields are called Index
2. Defaults to Average instead of Sum
3. Empty cells show as errors. Use fill_value
4. No Grand Totals by default! Turn on with margins=True
5. When you add Grand Totals, they are called "All" Unless you change them with Margins_Name
6. When you group by dates, you can't have grand totals
7. Does not sort by Custom Lists
8. Odd arrangement of headings when 2 row fields
But Python pivot tables have some advantages over Excel pivot tables:
Advantages over Excel
1. Automatic Recalc without Refresh
2. Date Grouping Offers some amazing options
In this episode, using Python in Excel to build data frames that look like pivot tables. You will see:
• Basic pivot table
• Adding Grand Totals
• Filling empty cells with zero
• Sorting or not sorting
• multiple row fields, column fields
• multiple value fields
• Sum of one field, mean of another
• Grouping dates by Month, Week, Quarter, Semi-Monthly, 3 days, 14 days, 2 weeks
• Crazy Excel formulas to reformat the top 2 rows of the pivot table
Table of Contents
(0:00) Podcast Words of the Day: Craw
(0:32) Syntax of Python Pivot_Table
(1:06) Contrast Python and Excel Pivot Tables
(1:58) Python better than Excel pivot tables
(2:18) Source data
(2:33) First Python Pivot Table
(2:44) #NUM! errors with Fill_Value
(3:03) Change Average to Sum
(3:30) Why blank 2nd row
(3:49) Adding Grand Totals with Margins=True
(4:35) Sorting a python pivot table in Excel
(5:07) Adding second row field
(5:50) Second column field
(6:23) Adding second Values field
(6:48) Sum Revenue and Average Profit
(7:10) Grouping Dates by Month
(8:20) Group by Weeks & More
(9:44) Blank Row 2
(10:19) Fixing the Python Pivot Table Formatting
(10:57) Can't combine Python formula with Excel formula
(11:21) Drunk kid on Christmas
(12:15) Wrap-up
Transcript of the video:
All right.
Hey today doing pivot tables in Python.
I'm excited I get to update this book, Microsoft Excel Pivot Table Data Crunching.
We'll have to add a chapter on Python pivot tables.
Something new today.
Let's try podcast words of the day.
The word craw, like I have something stuck in my craw.
You ever know what that is?
It's like the neck or a throat of a bird.
Isn't that a weird one?
Curmudgeon, Smart Lookup.
I thought maybe they'd just have a picture of me.
All right, df.pivot_table creates an Excel-like pivot table in a data frame.
The arguments, the weird one here is.
Like columns, is the column fields.
But for the row fields you have to use an argument name of index.
And then the other weird thing is the aggregate function defaults to mean instead of sum.
Empty cells show as errors you have to use this fill value instead and set it to something.
No grand totals by default, you have to turn those on with margins equals true.
And then when you turn them on, the margins, I mean, the grand totals are called ALL.
Unless you change the margin's name.
Can you group by dates?
Leila showed that the grouper function you can.
But then the grand totals have to be turned off or you get an error.
I don't know why I ever thought that this would ever happen.
But pivot tables sort by custom lists.
Of course Python can't see into our registry and know what our custom lists are.
So that's never going to work. And then word of the day is going to come up, which is just an odd arrangement of headings when we have multiple row fields.
Which brings me to the question of, wow, could I wrap this Python formula in the DROP function?
And I think I have to do it as two different fields.
Some advantages over Excel.
The pivot tables that we create in Python will automatically recalc without refresh.
And the date grouping actually offers some really cool options.
I wish that these options in Python grouping would come back to Excel.
It is very flexible.
All right, so the data that I always use for pivot tables, 563 rows, region, product, date sector, customer quantity, revenue, cost of goods sold and profit.
Our Python code is over here.
Let's just scroll over and we'll start with a very basic pivot table.
So values is revenue.
Index that's the row field is customers down the left hand side.
Columns across the top is region like that.
And you see all these #NUM errors.
That just means that ABC stores didn't buy anything in the central region or the east region.
And so we want to fix that.
That's called fill_value is equal to zero.
Control+enter.
All right, good, that's great.
And then by default the ag function is going to give us an average instead of a sum.
All right, good.
So we're back to proper SUM there.
I already...
What's up with these things you're like what do I have that blank row at the top.
And I see that they're putting the word region here.
Where in a regular pivot table would be up in L1.
All right, I'm not going to get stuck on that right now.
All right, so let's look at some of the cool things we can do.
So first off, you see that we have no grand totals here.
That's called margins_equals.
And by the way, True has to be capitalized.
Capital T, if you just put lowercase t, they have no clue what you're talking about.
Control+enter.
So that gets us a grand total at the right side and at the bottom it gives us a grand total row.
Let's change the margin name.
So margins_name equals ‘Total’.
I mean I guess it's nice that we can customize it.
In a pivot table it's always called Grand Total.
So I can just put the word Total there.
That's actually better.
Kind of silly though that we have to customize it.
The sorting option appears to be extremely limited.
Right now this is a sort of pivot table, it's sorting alphabetically.
If we would say sort equals False.
Then the data will come in the order that it originally appeared.
So if I would want to actually sort by something else like the Total.
I haven't figured out how to do that yet.
Maybe you can do it.
But so far I haven't been able to figure it out.
All right, so right now it's been pretty simple.
We have one field for the values.
One field for the columns.
One field for what they call index, and we call rows.
If we want to put multiple fields, we stack them up in square brackets.
So first sector as the outer row field, comma, customer.
Closing square bracket.
Control+enter.
Pretty nice.
Communications, it's alphabetized.
Communications, Energy, Financial.
Within, it's alphabetized.
There's no subtotals that are there.
Yeah, okay, so that's good.
Let's try two things across the top.
So where we have columns equals region.
Again square brackets and then comma, product square bracket control+enter.
So we get central, central, central, A, B, C, D, E, F, X, Y, Z.
All right, yep, that's good, that's good.
I like it.
And let's just go back to something simple here.
So index equals customer, columns equals region.
And then for values, let's put in two things.
So square brackets, we use revenue, profit, closing, square brackets, control enter.
So profit central, profit east, profit west, profit total, revenue, revenue, revenue, revenue.
I can't figure out why the NAs are appearing out there.
All right, here's something that's pretty different.
So here I have revenue and profit as the values.
And then for the aggregate function.
Curly braces?
Haven't seen that yet in Python.
And we say for the field called Revenue, we want to use Sum, but for the field called Profit, we want to use Mean.
So mixing the calculation for each field, that's good.
But next I want to take a look at how to group this by month or something like that.
So we'll go back to just Revenue.
Down the left-hand side, we'll put Date.
All right, so we have daily dates down the left-hand side.
I want to roll those up to months or quarters or years.
So instead of saying index is equal to date, we're going to put in this pandas.grouper.
The key field is the Date field and the frequency is going to be M for month.
Control+enter.
And we get an error.
It takes a long time to get the error too.
And the issue, it's not explained very well here at all.
Is that we cannot have the grand totals if we're grouping by month.
So get rid of those.
All right, there we go.
There's our months, including a leap day there in February of 2020.
When I went to look up the various frequencies that we can use.
It is pretty wild.
So we have an option for week, right?
That's great.
We have an option for quarter.
You can even do QS for quarter start.
Here's a good one.
SM for semi-monthly.
So that's the 15th and end of the month.
I can't figure out, why my data starts on January 1st, 2020 why we're getting that, I can't explain that one there.
And then semi-monthly starts.
Okay.
All right.
So we have all those built in, but we can do other crazy things.
So W is week and I can say I want to break it into two weeks.
So 1/5, 1/19, 2/2.
I can break it into, let's see, three-day periods.
Or for payroll, maybe 14-day periods.
All right, these are cool.
I wish that we had those in Excel.
Of course, I also wish we had grand totals in Python.
So it goes both ways.
“Pivot_table creates an Excel like pivot table in a new data frame”.
I don't know why I'm so irritated with these blank cells right here.
These four blank cells.
And so here's the same pivot table created in Excel.
It's not an “Excel-like pivot table”.
It is a pivot table.
And they move the word Region up above.
And it's like when I look at this pivot table, I don't even see the words Sum of Revenue and Region up there in row 2.
I just see this really good set of headings right here.
And wish that I had not sector and customer here and Central East, West, Total there.
So this Python formula is returning a spillable array there to cell K2.
And thanks to the Excel team and those 14 new functions they gave us last year.
It would be possible to use VSTACK, HSTACK, TAKE, CHOOSEROWS, DROP to take this array and make it into the array that I really want.
That's a complicated formula.
But it's not that complicated.
I guess my problem is, what I'd really like to do.
Is I'd really like to take this Python formula and replace it in here.
Every time I refer to K2.
Three times.
Have a LET function that points to that.
Because right now, I have to return this ugly-looking Python pivot table to the grid.
And then come over here and create the report that I really want.
Even if I came here and said, convert this to a data frame.
Control+Alt+Shift+M, then this can't deal with that.
I'm really aware that I'm like the kid on Christmas.
It's now the afternoon and I'm drunk on all of these toys that I have.
And I'm starting to say, “well, geez, Santa Claus brought me something here that is not as cool as I would've thought it would be”.
And I don't want to be THAT kid on Christmas.
So, hey, this is really good.
We have this great function pivot_table.
It lets us do a lot of cool data frames, take data and create a lot of analysis that are sort of like pivot tables.
Do we have all the pivot table functionality?
No, but it automatically refreshes.
And let's look at the good things.
The grouper has some very interesting date grouping, like every three days or twice a month, those kinds of things.
Those are all great, great things.
Well, hey, I appreciate everyone who's watching this series on Python.
It's great for me to kind of come up with this learning curve and share with you what I've learned along the way.
I want to thank you for stopping by.
We'll see you next time for another net cast from MrExcel.
If you like these videos, please down below, like, subscribe and ring the bell.
Feel free to post any questions or comments down in the comments below.
Hey today doing pivot tables in Python.
I'm excited I get to update this book, Microsoft Excel Pivot Table Data Crunching.
We'll have to add a chapter on Python pivot tables.
Something new today.
Let's try podcast words of the day.
The word craw, like I have something stuck in my craw.
You ever know what that is?
It's like the neck or a throat of a bird.
Isn't that a weird one?
Curmudgeon, Smart Lookup.
I thought maybe they'd just have a picture of me.
All right, df.pivot_table creates an Excel-like pivot table in a data frame.
The arguments, the weird one here is.
Like columns, is the column fields.
But for the row fields you have to use an argument name of index.
And then the other weird thing is the aggregate function defaults to mean instead of sum.
Empty cells show as errors you have to use this fill value instead and set it to something.
No grand totals by default, you have to turn those on with margins equals true.
And then when you turn them on, the margins, I mean, the grand totals are called ALL.
Unless you change the margin's name.
Can you group by dates?
Leila showed that the grouper function you can.
But then the grand totals have to be turned off or you get an error.
I don't know why I ever thought that this would ever happen.
But pivot tables sort by custom lists.
Of course Python can't see into our registry and know what our custom lists are.
So that's never going to work. And then word of the day is going to come up, which is just an odd arrangement of headings when we have multiple row fields.
Which brings me to the question of, wow, could I wrap this Python formula in the DROP function?
And I think I have to do it as two different fields.
Some advantages over Excel.
The pivot tables that we create in Python will automatically recalc without refresh.
And the date grouping actually offers some really cool options.
I wish that these options in Python grouping would come back to Excel.
It is very flexible.
All right, so the data that I always use for pivot tables, 563 rows, region, product, date sector, customer quantity, revenue, cost of goods sold and profit.
Our Python code is over here.
Let's just scroll over and we'll start with a very basic pivot table.
So values is revenue.
Index that's the row field is customers down the left hand side.
Columns across the top is region like that.
And you see all these #NUM errors.
That just means that ABC stores didn't buy anything in the central region or the east region.
And so we want to fix that.
That's called fill_value is equal to zero.
Control+enter.
All right, good, that's great.
And then by default the ag function is going to give us an average instead of a sum.
All right, good.
So we're back to proper SUM there.
I already...
What's up with these things you're like what do I have that blank row at the top.
And I see that they're putting the word region here.
Where in a regular pivot table would be up in L1.
All right, I'm not going to get stuck on that right now.
All right, so let's look at some of the cool things we can do.
So first off, you see that we have no grand totals here.
That's called margins_equals.
And by the way, True has to be capitalized.
Capital T, if you just put lowercase t, they have no clue what you're talking about.
Control+enter.
So that gets us a grand total at the right side and at the bottom it gives us a grand total row.
Let's change the margin name.
So margins_name equals ‘Total’.
I mean I guess it's nice that we can customize it.
In a pivot table it's always called Grand Total.
So I can just put the word Total there.
That's actually better.
Kind of silly though that we have to customize it.
The sorting option appears to be extremely limited.
Right now this is a sort of pivot table, it's sorting alphabetically.
If we would say sort equals False.
Then the data will come in the order that it originally appeared.
So if I would want to actually sort by something else like the Total.
I haven't figured out how to do that yet.
Maybe you can do it.
But so far I haven't been able to figure it out.
All right, so right now it's been pretty simple.
We have one field for the values.
One field for the columns.
One field for what they call index, and we call rows.
If we want to put multiple fields, we stack them up in square brackets.
So first sector as the outer row field, comma, customer.
Closing square bracket.
Control+enter.
Pretty nice.
Communications, it's alphabetized.
Communications, Energy, Financial.
Within, it's alphabetized.
There's no subtotals that are there.
Yeah, okay, so that's good.
Let's try two things across the top.
So where we have columns equals region.
Again square brackets and then comma, product square bracket control+enter.
So we get central, central, central, A, B, C, D, E, F, X, Y, Z.
All right, yep, that's good, that's good.
I like it.
And let's just go back to something simple here.
So index equals customer, columns equals region.
And then for values, let's put in two things.
So square brackets, we use revenue, profit, closing, square brackets, control enter.
So profit central, profit east, profit west, profit total, revenue, revenue, revenue, revenue.
I can't figure out why the NAs are appearing out there.
All right, here's something that's pretty different.
So here I have revenue and profit as the values.
And then for the aggregate function.
Curly braces?
Haven't seen that yet in Python.
And we say for the field called Revenue, we want to use Sum, but for the field called Profit, we want to use Mean.
So mixing the calculation for each field, that's good.
But next I want to take a look at how to group this by month or something like that.
So we'll go back to just Revenue.
Down the left-hand side, we'll put Date.
All right, so we have daily dates down the left-hand side.
I want to roll those up to months or quarters or years.
So instead of saying index is equal to date, we're going to put in this pandas.grouper.
The key field is the Date field and the frequency is going to be M for month.
Control+enter.
And we get an error.
It takes a long time to get the error too.
And the issue, it's not explained very well here at all.
Is that we cannot have the grand totals if we're grouping by month.
So get rid of those.
All right, there we go.
There's our months, including a leap day there in February of 2020.
When I went to look up the various frequencies that we can use.
It is pretty wild.
So we have an option for week, right?
That's great.
We have an option for quarter.
You can even do QS for quarter start.
Here's a good one.
SM for semi-monthly.
So that's the 15th and end of the month.
I can't figure out, why my data starts on January 1st, 2020 why we're getting that, I can't explain that one there.
And then semi-monthly starts.
Okay.
All right.
So we have all those built in, but we can do other crazy things.
So W is week and I can say I want to break it into two weeks.
So 1/5, 1/19, 2/2.
I can break it into, let's see, three-day periods.
Or for payroll, maybe 14-day periods.
All right, these are cool.
I wish that we had those in Excel.
Of course, I also wish we had grand totals in Python.
So it goes both ways.
“Pivot_table creates an Excel like pivot table in a new data frame”.
I don't know why I'm so irritated with these blank cells right here.
These four blank cells.
And so here's the same pivot table created in Excel.
It's not an “Excel-like pivot table”.
It is a pivot table.
And they move the word Region up above.
And it's like when I look at this pivot table, I don't even see the words Sum of Revenue and Region up there in row 2.
I just see this really good set of headings right here.
And wish that I had not sector and customer here and Central East, West, Total there.
So this Python formula is returning a spillable array there to cell K2.
And thanks to the Excel team and those 14 new functions they gave us last year.
It would be possible to use VSTACK, HSTACK, TAKE, CHOOSEROWS, DROP to take this array and make it into the array that I really want.
That's a complicated formula.
But it's not that complicated.
I guess my problem is, what I'd really like to do.
Is I'd really like to take this Python formula and replace it in here.
Every time I refer to K2.
Three times.
Have a LET function that points to that.
Because right now, I have to return this ugly-looking Python pivot table to the grid.
And then come over here and create the report that I really want.
Even if I came here and said, convert this to a data frame.
Control+Alt+Shift+M, then this can't deal with that.
I'm really aware that I'm like the kid on Christmas.
It's now the afternoon and I'm drunk on all of these toys that I have.
And I'm starting to say, “well, geez, Santa Claus brought me something here that is not as cool as I would've thought it would be”.
And I don't want to be THAT kid on Christmas.
So, hey, this is really good.
We have this great function pivot_table.
It lets us do a lot of cool data frames, take data and create a lot of analysis that are sort of like pivot tables.
Do we have all the pivot table functionality?
No, but it automatically refreshes.
And let's look at the good things.
The grouper has some very interesting date grouping, like every three days or twice a month, those kinds of things.
Those are all great, great things.
Well, hey, I appreciate everyone who's watching this series on Python.
It's great for me to kind of come up with this learning curve and share with you what I've learned along the way.
I want to thank you for stopping by.
We'll see you next time for another net cast from MrExcel.
If you like these videos, please down below, like, subscribe and ring the bell.
Feel free to post any questions or comments down in the comments below.