Microsoft Excel Tutorial: Python in Excel
After using Python in Excel for 2 days, I am gaining confidence. Topics today:
• Python Tips from Day 2
• Slow rollout of Python
• Paying for Anaconda
• Python Libraries to explore
• How to write Python results back to Excel grid
• Real-life K-Means clustering with 18K customers, 30K transactions, 300 products
To download the first workbook: Gaining Confidence With Python In Excel - 2615 Sample Files - MrExcel Publishing
The second workbook contains actual customer data and I am not sharing it at this time.
Table of Contents
(0:00) Welcome
(0:45) Keyboard shortcuts for Python
(1:22) Leila tip DF.Customer eliminates square bracket notation
(2:14) Seaborn Library of Charts as shown by Mynda
(2:49) Initialization of 5 Python libraries automatically
(3:25) Partial Calculation Mode in Excel
(3:48) Use .Mean for AVERAGE()
(4:10) Chances you have Python
(4:40) Eventual cost is Anaconda which adds security
(5:15) Python libraries supported
(5:35) InDesign and Ctrl+Alt+Shift shortcuts
(6:37) Writing Python results back to Excel!
(8:21) Python examples start with fake random data
(8:39) Real data example with Python
(8:52) K-Means clustering based on 100's of products
(9:48) Using Power Query with Python in Excel
(11:20) 3D Scatter Chart in Python
(11:52) Choosing Number of Clusters
(12:13) Reset Python after 30 minutes inactivity
(13:00) Loading data frame back to Excel grid
(13:55) Data-Sciencey
(14:15) Tomorrow: Python pivot tables in Excel
After using Python in Excel for 2 days, I am gaining confidence. Topics today:
• Python Tips from Day 2
• Slow rollout of Python
• Paying for Anaconda
• Python Libraries to explore
• How to write Python results back to Excel grid
• Real-life K-Means clustering with 18K customers, 30K transactions, 300 products
To download the first workbook: Gaining Confidence With Python In Excel - 2615 Sample Files - MrExcel Publishing
The second workbook contains actual customer data and I am not sharing it at this time.
Table of Contents
(0:00) Welcome
(0:45) Keyboard shortcuts for Python
(1:22) Leila tip DF.Customer eliminates square bracket notation
(2:14) Seaborn Library of Charts as shown by Mynda
(2:49) Initialization of 5 Python libraries automatically
(3:25) Partial Calculation Mode in Excel
(3:48) Use .Mean for AVERAGE()
(4:10) Chances you have Python
(4:40) Eventual cost is Anaconda which adds security
(5:15) Python libraries supported
(5:35) InDesign and Ctrl+Alt+Shift shortcuts
(6:37) Writing Python results back to Excel!
(8:21) Python examples start with fake random data
(8:39) Real data example with Python
(8:52) K-Means clustering based on 100's of products
(9:48) Using Power Query with Python in Excel
(11:20) 3D Scatter Chart in Python
(11:52) Choosing Number of Clusters
(12:13) Reset Python after 30 minutes inactivity
(13:00) Loading data frame back to Excel grid
(13:55) Data-Sciencey
(14:15) Tomorrow: Python pivot tables in Excel
Transcript of the video:
Python and Excel.
What did we learn on day two?
Oh, what a great day it was August 22nd, 2023, Python debuts in Excel.
We had three great videos, 40 minutes of content.
Mynda was first with her 10 minutes.
Leila, about an hour later, and I think I was two minutes after Leila.
Funny story.
Mrs.
Excel and I watched every one of these videos in primetime on the big screen TV, which is very unusual for Mary Ellen.
She's like, "Oh, I can't wait to try this".
Here's some things that I've learned after 48 hours in using it for the first day for this video, and then really trying to dive in and gain more confidence.
I've become a huge fan of these four shortcut keys, two of which are new: Control, Alt, Shift, P, will get us into Python editor mode.
A lot of people are complaining that equal PY tab or equal PY open param is a pain.
I will never do that again.
Control, Alt, Shift, P.
And then once you're in to toggle between return values to Excel or return to Python, object is Control, Alt, Shift, M.
Making the formula bar larger is Control, Shift, U.
That's not new that's been there forever.
And toggling between in-cell editing and the formula bar, Control, F2.
Trying to get all of these learned.
Shout out to Leila.
The only bold item here, the one that is for me, a huge improvement, Python uses this square bracket so we have a data frame.
You have to square bracket apostrophe customer or Mynda used quotes customer, but if your field names have no spaces or special characters, you can just shorten that to df.customer.
Thanks Leila.
I will never put a space in a field name ever again.
Leila at the end showed using Python with Power Query in a future video, so she teased it.
That is going to be awesome and it should be working, but there is a bug on day one that is preventing it from working, but it will be working soon.
Oh, Mynda.
You have to watch Mynda's video.
She showed how to use this Seaborn library and built some of these amazing charts in her video that just blew me away.
Check out her video right around 6:32.
Also, another great trick from Mynda, you know the chart comes back in a cell, and it's way too small, and everyone makes the cell larger or merges the cells.
I hate to do that, but Mynda showed if the chart's too small, right click, choose picture in-cell, create reference, and that creates a nice large version of that image.
Hey, for me, procrastination helps.
I actually recorded my video on day one and by then they had added, on the formulas tab, the beautiful initialization panel.
That shows us which libraries are automatically loaded by default.
Every code that you copy from ChatGPT or the internet, is going to be loading several different libraries.
Well, all of those, all of these five are automatically loaded, so you don't have to type those.
Makes life a little bit simpler.
Leila, great example of a pivot table at 8:47 to group things up by month or week.
Both Mynda and Leila use df.describe for descriptive statistics.
Here's a cool one, and this just changed since day one.
Under calculation options, we now have automatic, manual, partial.
Partial turns off both data tables, data what if analysis, data table, and also pauses Python.
If you don't want to be updating your Python every single time, use this new partial setting.
That's a great one.
I got tripped up by this .sum is SUM.
That's obvious, but what I try to do .average, no, it's not AVERAGE, it's .mean.
Thanks Leila.
And also, a great example there of using Unpivot and FILTER.
I really recommend you watch both Leila and Mynda's videos because there's lots of great tips there.
The big question is if you're on the beta, do you have it?
And so I tried to pin Microsoft down.
How many people got it on day one?
Their guidance is the more worried that we're going to break Excel with a feature, the less people get it on day one.
What is that 5%, 10%?
A few of you figured out that you could find these icons and add them, but it won't work.
If they're not there, you're not going to get it.
And also, I'm going to ease up on that eventual subscription cost.
There were a few people on Twitter who said, "Oh great, Python to Excel.
That's another attack vector".
Well, no, because we're not running Python locally.
It's running in the cloud, thanks to a company called Anaconda, which provides great security.
The fact that we're eventually going to have to pay for Python really means that we're paying for Anaconda, and so I am going to be okay with that.
I take back my, "They shouldn't charge us extra for this".
Well, no, they're charging for the Anaconda, which brings us lots of good things.
Okay, so these five libraries are imported automatically.
Check out Mynda's using charts from the Seaborn.
That's great, but there's a whole bunch of other libraries that are included.
I use scikit-learn for the k-means clustering in my last video.
It looks like a lot of great things.
Google all of these and see what you can do.
My tip here for Control, Alt, Shift shortcuts.
Excel is my favorite product, but there's a lot of times when we are working on a book and I'm publishing a book that I'm in Adobe InDesign, and every time that there's a book and there's a screenshot, I have to press Control, Alt, Shift, E.
I was initially amused at the fact that Adobe had so many keyboard shortcuts already that they had to go to Control, ALT, Shift, E, but I'm doing it 1,500 times a book, per book.
I've gotten really good at this.
It's completely natural to me.
For me, it's all left hand.
I mash down Control, Alt, Shift in an angry fashion that I have to do this, but having Control, Alt, Shift, E 1,500 times per book in InDesign made it really natural for me to use Control, Alt, Shift, P, to open Python.
And then Control, Alt, Shift, M to switch back and forth between returning a Python object and returning Excel values.
Go back to that example that I did in the first video, where we did k-means clustering.
This was an example in the book that I read 10 years ago and I so wanted to do it with my data.
And while I showed that yesterday, then the question is, "Okay, so what?
Now we have these little clusters.
How do we get that back into our data?" It turns out Control, Shift, U to open this up.
We take the data from a A5 to C256 and load it into a data frame that we call df.
Do the k-means clustering here, create the chart here, but then check this out.
I'm going to write a new column back to my data frame.
All right, so the data frame was originally in this data here, and I say df, new column called cluster is equal to k means.labels.
And that takes these clusters: zero, one, two, three, and four, and adds it right back into the data in memory.
Now, this gives me new power.
I'm not just reading from Excel, but I can also write new data back to the data frame in memory.
Then to surface that, here I just came here and simple little Python, so Control, Alt, Shift, P, said df, and then use Control, Alt, Shift, M, or open this dropdown and choose to return as an Excel value.
And I get my original data, the XY and customer ID, and the cluster.
That gave me superpowers because now I can actually do calculations and write that data back to Excel.
I'm also going to point out, that I'm mildly amused that just about every Python example that I find starts out with a few lines of code that creates some fake random data, a hundred rows of random data.
I did that on day one.
I created fake random data, but the value here is being able to use real data.
Here it is, 33,000 rows of transaction data.
I actually have people's real names and emails in there.
I've hidden them with a white font, and a lot of different products here.
I wanted to do k-means clustering based on these products, hundreds of products.
I created a nice little unique list of all the products and for each one I categorized it.
Is it a straight Excel book?
Is it a Power Query book, a VBA book, or is it Power BI?
I included Rob Collie's original book on Dax formulas as Power BI, although it was really Power Pivot.
When we talk about Power BI here, it's a combination of Power Pivot.
Some books, like Mike Urban's new book.
I said, "Well, there's a lot of Power Query in there and there's a lot of Power BI in there, but most of the book is Excel".
I was able to take a book and categorize it.
Then over here, I have a connection to the products, a connection to transactions, and a huge summary table that then figures out total revenue by customer by these four categories.
Now, according to Microsoft's own documentation, we should be able to use something in Power Query, create a connection only.
Right there, categories, and then use xl("Categories") and refer to that as a new data frame.
Unfortunately, it's not working.
The Excel team says it should be working.
They're aware of the problem, they're working on the problem, and within a couple of days from now it will be working.
Here where I have queries and connections, my summary table was originally connection only, and I should have been able to come here to Python and say, "Data frame equals xl," and then the name of the query.
Unfortunately, it's not working, so today I had to load it over here to the sheet.
Doesn't feel as good as the first day.
Now, 18,000 rows and four categories, so not just X and Y, but XL, Power BI, Power Query, VBA.
Use the k-means clustering.
This is already built in.
It's basically trying to find groups of customers, groups of similar customers, and then adding that back into the data frame so I can actually find the customers that fit into each group.
The thing is we don't really know how many clusters we want, so check this out.
I actually have a little slicer here.
The slicer is driving a value in U11 and it just has the number of clusters to choose.
I'm able to reuse that variable here in the K-means clustering to specify how many clusters to use.
Once I had the data from here, I went out to Matplot.lib, and found a great plot for a 3D scatter plot, and chose to show Excel, Power BI, and Power Query on the three axes here.
Plot.show.
I actually have four of these.
I couldn't figure out a good way to do a four dimensional chart, so I just created four charts with three dimensions in each one.
Okay, so here we go.
We have what, 30,000 transactions summarized down to 18,000 customers.
I want to create clusters from those customers.
I don't know how many are the right numbers, so I choose, let's say, I don't know, six.
And then here under formulas, because I've turned this on to partial, then I have to do a calculate Now.
Calculate Now, we'll send all this out.
Oh, look at that.
Shoot, I waited too long.
After 30 minutes, we have to reset.
Let's reset the connection and calculate now.
All right, that was an insane amount of calculation that happened in the background.
Sending that data out, clustering them into six groups.
Now, let's see how long it takes.
Here, we'll come in here and do four groups and then calculate now.
That's pretty fast.
That's a lot going on.
All right, and then if I actually wanted to get this data back, I have this data frame called df.
I would just come down here to where I want the data to be.
Actually, I need to make sure to get to the right of my little slicer trick there.
And then, we will just Control, Alt, Shift, P to get into Python.
I'll say df and I want to return that as a, or Control, Alt, Shift, M, as an Excel object.
Now unfortunately when I return this, you're going to see some customer information, so I'm going to pause the video and hide that data.
Hang on, press control, enter.
All right, we'll hide those, the customer information there.
And so now I have clusters assigned to the various customers.
If I'm going to market, if I have a new book coming out on Power Query, I know to find the clusters that have a lot of Power Query customers and I can isolate those and send emails to them.
It feels pretty good.
It feels data sciencey, if that's a word.
Could I have figured out some other way to find the customers that buy a lot of Power Query?
Yes, but I've always been jealous sitting on the outside and not being able to use Python, and now this brings Python to the table for all of us Excel people.
I can tell you already today on day three, I've been building pivot tables with Python, looking at all the options that are available there.
Not quite everything that we have in Excel, but it re-calcs automatically.
That'll probably be tomorrow's video, so check that out.
Thanks for watching today.
It seems very exciting to me to be able to finally tap into the power of Python right here from Excel without having to gen up a Python server right on my computer or something like that.
Exciting times.
I want to thank you for stopping by.
We'll see you next time for another net cast from Mr.
Excel.
If you like these videos, please down below, like, subscribe, and ring the bell.
Feel free to post any questions or comments down in the comments below.
What did we learn on day two?
Oh, what a great day it was August 22nd, 2023, Python debuts in Excel.
We had three great videos, 40 minutes of content.
Mynda was first with her 10 minutes.
Leila, about an hour later, and I think I was two minutes after Leila.
Funny story.
Mrs.
Excel and I watched every one of these videos in primetime on the big screen TV, which is very unusual for Mary Ellen.
She's like, "Oh, I can't wait to try this".
Here's some things that I've learned after 48 hours in using it for the first day for this video, and then really trying to dive in and gain more confidence.
I've become a huge fan of these four shortcut keys, two of which are new: Control, Alt, Shift, P, will get us into Python editor mode.
A lot of people are complaining that equal PY tab or equal PY open param is a pain.
I will never do that again.
Control, Alt, Shift, P.
And then once you're in to toggle between return values to Excel or return to Python, object is Control, Alt, Shift, M.
Making the formula bar larger is Control, Shift, U.
That's not new that's been there forever.
And toggling between in-cell editing and the formula bar, Control, F2.
Trying to get all of these learned.
Shout out to Leila.
The only bold item here, the one that is for me, a huge improvement, Python uses this square bracket so we have a data frame.
You have to square bracket apostrophe customer or Mynda used quotes customer, but if your field names have no spaces or special characters, you can just shorten that to df.customer.
Thanks Leila.
I will never put a space in a field name ever again.
Leila at the end showed using Python with Power Query in a future video, so she teased it.
That is going to be awesome and it should be working, but there is a bug on day one that is preventing it from working, but it will be working soon.
Oh, Mynda.
You have to watch Mynda's video.
She showed how to use this Seaborn library and built some of these amazing charts in her video that just blew me away.
Check out her video right around 6:32.
Also, another great trick from Mynda, you know the chart comes back in a cell, and it's way too small, and everyone makes the cell larger or merges the cells.
I hate to do that, but Mynda showed if the chart's too small, right click, choose picture in-cell, create reference, and that creates a nice large version of that image.
Hey, for me, procrastination helps.
I actually recorded my video on day one and by then they had added, on the formulas tab, the beautiful initialization panel.
That shows us which libraries are automatically loaded by default.
Every code that you copy from ChatGPT or the internet, is going to be loading several different libraries.
Well, all of those, all of these five are automatically loaded, so you don't have to type those.
Makes life a little bit simpler.
Leila, great example of a pivot table at 8:47 to group things up by month or week.
Both Mynda and Leila use df.describe for descriptive statistics.
Here's a cool one, and this just changed since day one.
Under calculation options, we now have automatic, manual, partial.
Partial turns off both data tables, data what if analysis, data table, and also pauses Python.
If you don't want to be updating your Python every single time, use this new partial setting.
That's a great one.
I got tripped up by this .sum is SUM.
That's obvious, but what I try to do .average, no, it's not AVERAGE, it's .mean.
Thanks Leila.
And also, a great example there of using Unpivot and FILTER.
I really recommend you watch both Leila and Mynda's videos because there's lots of great tips there.
The big question is if you're on the beta, do you have it?
And so I tried to pin Microsoft down.
How many people got it on day one?
Their guidance is the more worried that we're going to break Excel with a feature, the less people get it on day one.
What is that 5%, 10%?
A few of you figured out that you could find these icons and add them, but it won't work.
If they're not there, you're not going to get it.
And also, I'm going to ease up on that eventual subscription cost.
There were a few people on Twitter who said, "Oh great, Python to Excel.
That's another attack vector".
Well, no, because we're not running Python locally.
It's running in the cloud, thanks to a company called Anaconda, which provides great security.
The fact that we're eventually going to have to pay for Python really means that we're paying for Anaconda, and so I am going to be okay with that.
I take back my, "They shouldn't charge us extra for this".
Well, no, they're charging for the Anaconda, which brings us lots of good things.
Okay, so these five libraries are imported automatically.
Check out Mynda's using charts from the Seaborn.
That's great, but there's a whole bunch of other libraries that are included.
I use scikit-learn for the k-means clustering in my last video.
It looks like a lot of great things.
Google all of these and see what you can do.
My tip here for Control, Alt, Shift shortcuts.
Excel is my favorite product, but there's a lot of times when we are working on a book and I'm publishing a book that I'm in Adobe InDesign, and every time that there's a book and there's a screenshot, I have to press Control, Alt, Shift, E.
I was initially amused at the fact that Adobe had so many keyboard shortcuts already that they had to go to Control, ALT, Shift, E, but I'm doing it 1,500 times a book, per book.
I've gotten really good at this.
It's completely natural to me.
For me, it's all left hand.
I mash down Control, Alt, Shift in an angry fashion that I have to do this, but having Control, Alt, Shift, E 1,500 times per book in InDesign made it really natural for me to use Control, Alt, Shift, P, to open Python.
And then Control, Alt, Shift, M to switch back and forth between returning a Python object and returning Excel values.
Go back to that example that I did in the first video, where we did k-means clustering.
This was an example in the book that I read 10 years ago and I so wanted to do it with my data.
And while I showed that yesterday, then the question is, "Okay, so what?
Now we have these little clusters.
How do we get that back into our data?" It turns out Control, Shift, U to open this up.
We take the data from a A5 to C256 and load it into a data frame that we call df.
Do the k-means clustering here, create the chart here, but then check this out.
I'm going to write a new column back to my data frame.
All right, so the data frame was originally in this data here, and I say df, new column called cluster is equal to k means.labels.
And that takes these clusters: zero, one, two, three, and four, and adds it right back into the data in memory.
Now, this gives me new power.
I'm not just reading from Excel, but I can also write new data back to the data frame in memory.
Then to surface that, here I just came here and simple little Python, so Control, Alt, Shift, P, said df, and then use Control, Alt, Shift, M, or open this dropdown and choose to return as an Excel value.
And I get my original data, the XY and customer ID, and the cluster.
That gave me superpowers because now I can actually do calculations and write that data back to Excel.
I'm also going to point out, that I'm mildly amused that just about every Python example that I find starts out with a few lines of code that creates some fake random data, a hundred rows of random data.
I did that on day one.
I created fake random data, but the value here is being able to use real data.
Here it is, 33,000 rows of transaction data.
I actually have people's real names and emails in there.
I've hidden them with a white font, and a lot of different products here.
I wanted to do k-means clustering based on these products, hundreds of products.
I created a nice little unique list of all the products and for each one I categorized it.
Is it a straight Excel book?
Is it a Power Query book, a VBA book, or is it Power BI?
I included Rob Collie's original book on Dax formulas as Power BI, although it was really Power Pivot.
When we talk about Power BI here, it's a combination of Power Pivot.
Some books, like Mike Urban's new book.
I said, "Well, there's a lot of Power Query in there and there's a lot of Power BI in there, but most of the book is Excel".
I was able to take a book and categorize it.
Then over here, I have a connection to the products, a connection to transactions, and a huge summary table that then figures out total revenue by customer by these four categories.
Now, according to Microsoft's own documentation, we should be able to use something in Power Query, create a connection only.
Right there, categories, and then use xl("Categories") and refer to that as a new data frame.
Unfortunately, it's not working.
The Excel team says it should be working.
They're aware of the problem, they're working on the problem, and within a couple of days from now it will be working.
Here where I have queries and connections, my summary table was originally connection only, and I should have been able to come here to Python and say, "Data frame equals xl," and then the name of the query.
Unfortunately, it's not working, so today I had to load it over here to the sheet.
Doesn't feel as good as the first day.
Now, 18,000 rows and four categories, so not just X and Y, but XL, Power BI, Power Query, VBA.
Use the k-means clustering.
This is already built in.
It's basically trying to find groups of customers, groups of similar customers, and then adding that back into the data frame so I can actually find the customers that fit into each group.
The thing is we don't really know how many clusters we want, so check this out.
I actually have a little slicer here.
The slicer is driving a value in U11 and it just has the number of clusters to choose.
I'm able to reuse that variable here in the K-means clustering to specify how many clusters to use.
Once I had the data from here, I went out to Matplot.lib, and found a great plot for a 3D scatter plot, and chose to show Excel, Power BI, and Power Query on the three axes here.
Plot.show.
I actually have four of these.
I couldn't figure out a good way to do a four dimensional chart, so I just created four charts with three dimensions in each one.
Okay, so here we go.
We have what, 30,000 transactions summarized down to 18,000 customers.
I want to create clusters from those customers.
I don't know how many are the right numbers, so I choose, let's say, I don't know, six.
And then here under formulas, because I've turned this on to partial, then I have to do a calculate Now.
Calculate Now, we'll send all this out.
Oh, look at that.
Shoot, I waited too long.
After 30 minutes, we have to reset.
Let's reset the connection and calculate now.
All right, that was an insane amount of calculation that happened in the background.
Sending that data out, clustering them into six groups.
Now, let's see how long it takes.
Here, we'll come in here and do four groups and then calculate now.
That's pretty fast.
That's a lot going on.
All right, and then if I actually wanted to get this data back, I have this data frame called df.
I would just come down here to where I want the data to be.
Actually, I need to make sure to get to the right of my little slicer trick there.
And then, we will just Control, Alt, Shift, P to get into Python.
I'll say df and I want to return that as a, or Control, Alt, Shift, M, as an Excel object.
Now unfortunately when I return this, you're going to see some customer information, so I'm going to pause the video and hide that data.
Hang on, press control, enter.
All right, we'll hide those, the customer information there.
And so now I have clusters assigned to the various customers.
If I'm going to market, if I have a new book coming out on Power Query, I know to find the clusters that have a lot of Power Query customers and I can isolate those and send emails to them.
It feels pretty good.
It feels data sciencey, if that's a word.
Could I have figured out some other way to find the customers that buy a lot of Power Query?
Yes, but I've always been jealous sitting on the outside and not being able to use Python, and now this brings Python to the table for all of us Excel people.
I can tell you already today on day three, I've been building pivot tables with Python, looking at all the options that are available there.
Not quite everything that we have in Excel, but it re-calcs automatically.
That'll probably be tomorrow's video, so check that out.
Thanks for watching today.
It seems very exciting to me to be able to finally tap into the power of Python right here from Excel without having to gen up a Python server right on my computer or something like that.
Exciting times.
I want to thank you for stopping by.
We'll see you next time for another net cast from Mr.
Excel.
If you like these videos, please down below, like, subscribe, and ring the bell.
Feel free to post any questions or comments down in the comments below.