Microsoft Excel Tutorial: VLOOKUP in Python for Excel
How to do a VLOOKUP or XLOOKUP in Python.
To download the examples in this workbook: Excel Python XLOOKUP 2619 Sample Files - MrExcel Publishing
Along the way, you will see:
101: Doing a VLOOKUP
Not specifying the key field!
What if a customer is missing from lookup table? How to IFERROR()
Limiting which fields are returned.
What if headings don't match?
What if a customer is duplicated?
Lookup on two fields.
Table of Contents
(0:00) Python lookup overview
(0:25) Comment indicator in Python
(1:00) VLOOKUP 101 in Python using pd.merge
(2:36) Leaving off the On field
(2:50) IFERROR when customer missing with .FillNA
(3:23) Limiting lookup table to needed fields
(4:04) Headings don't match left_on and right_on!
(4:50) .drop method to remove a column from Python
(5:12) If duplicate in lookup table
(6:00) drop_duplicates to remove duplicates
(7:06) What is the right way to show duplicates
(7:29) Lookup on two fields
(8:11) Final thoughts on pd.merge
(8:50) Building blocks in Python
(9:28) Nancy Faust
How to do a VLOOKUP or XLOOKUP in Python.
To download the examples in this workbook: Excel Python XLOOKUP 2619 Sample Files - MrExcel Publishing
Along the way, you will see:
101: Doing a VLOOKUP
Not specifying the key field!
What if a customer is missing from lookup table? How to IFERROR()
Limiting which fields are returned.
What if headings don't match?
What if a customer is duplicated?
Lookup on two fields.
Table of Contents
(0:00) Python lookup overview
(0:25) Comment indicator in Python
(1:00) VLOOKUP 101 in Python using pd.merge
(2:36) Leaving off the On field
(2:50) IFERROR when customer missing with .FillNA
(3:23) Limiting lookup table to needed fields
(4:04) Headings don't match left_on and right_on!
(4:50) .drop method to remove a column from Python
(5:12) If duplicate in lookup table
(6:00) drop_duplicates to remove duplicates
(7:06) What is the right way to show duplicates
(7:29) Lookup on two fields
(8:11) Final thoughts on pd.merge
(8:50) Building blocks in Python
(9:28) Nancy Faust
Transcript of the video:
This one, remarkably easy, doing an XLOOKUP or a VLOOKUP in Python.
A brief overview.
We're going to start with just a simple VLOOKUP, and then something crazy, not specifying what the key field is.
What to do with the customer is missing from the lookup table.
Essentially the IFERROR equivalent.
Limiting which fields are returned, which is something we don't have to worry about with VLOOKUP.
What if the headings don't match? What if a customer is duplicated?
This one is going to throw us for a loop. Lookup on two fields, and then some thoughts.
Now, before I get started, let's just talk about the comment indicator.
So, this hash allows you to comment a line, explains what the next line is.
But what I've done here a lot in this video is I put a line of code that demonstrates a problem.
But I've already figured out what the solution is.
So, right here in the middle of the line, I can put another hash and then the solution's out here. That doesn't get run because it's after the hash.
So, that allows me to show the problem.
And then I'll get rid of the hash and pull the solution in, and hopefully it will be faster. Let's go.
So, we have a large data set over here on the left-hand side and a small lookup table.
I want to get the sector field from here and add it to there.
To get into Python Ctrl+Alt+Shift+P.
This is one, you see that right now it's returning a Python object, but I want to see the results so I'm going to do Ctrl+Alt+Shift+M, to convert that to return as Excel values.
First off, define the data frame is equal to our large data set here.
Enter to go to the next line. LT for Lookup Table.
Next line. I'll go with MJ, My Join.
So, we're essentially doing a database join here. It is weird, it's not just merge, it's pd.merge.
And we give it the left table. So DF is the table on the left.
The lookup table is the lookup table. What column is in common between them?
So that's on equals customer.
And then the Join type. This is kind of like Power Query.
Choices are Left, Right, Outer, Inner and Cross. I'll put a graphic up there on the screen.
I'm going to use Left and then Ctrl+Enter. All right, look at that.
So here's our original fields.
Product, Date, Customer, Quantity, Revenue, Profit in the same order. Kept the same sequence.
Added the new field from the lookup table - Sector.
That's beautiful. That's great.
In fact, because this is called Customer and this is called Customer, you don't even have to specify the on, it'll just look for the field in common.
That's great. All right, now some nuances here.
What happens if a customer is missing from the lookup table?
So I added a new customer here, AAAMissing, and you see that we get this #HUM! error.
To get rid of those, so the equivalent of IFERROR or IFNA is this .fillna.
And we're going to fill the NAs with just quote, quote.
So control, enter there and we'll just get blanks.
Or you could put it in the word missing or anything like that. Now, the thing that's kind of weird here.
Is they're going to give me all the fields from the lookup table that aren't in the original dataset.
So like here my lookup table in real life, you got a lookup table that has fields that you don't actually need.
So what happens?
They match it on Customer because that's the field in common.
But then they also give me Sector, Country, and Account Number because those are all in the lookup table.
So what we can do is when we define the lookup table.
I can remove columns right there.
So it's two left square brackets, the columns I want to keep, and then two right square brackets. And that should get us just Sector over here.
Beautiful. All right.
That's great. That works.
Now, what if we have fields that aren't the same heading?
This happens to me all the time.
The data comes down from the mainframe, it's called Customer.
And in my lookup table it's called Account or something like that.
So in the original example, we use the On parameter.
But there's also parameters for a left_ on, there's an underscore.
And right_on, right on!
So you can specify the left field uses Customer and the right field uses Account. But then what's weird to me.
To me I wonder what the heck Python is thinking.
Because I basically told you that Customer and Account are the same thing. And then they added Account in.
So after I get that whole thing. I can use this great method called Drop.
And the axis, if you want to drop rows, that's the axis zero.
To drop columns, axis equals one.
So there we have the Drop to remove a column that we don't want from the result. Control, enter and we get rid of that account.
All right, that's good. This one.
Ohhh, Yeah. Okay.
See, this is our weakness coming at this as Excel people.
Because as Excel people we know that if we do a VLOOKUP and there's a duplicate here.
It's just going to give us the first one.
In XLOOKUP, we can say give us the first one or the last one, but it's never going to duplicate things.
Power Query, of course, will. So here I'm just being obstinate.
I put Access Analytic in there twice. And you see that it shows up twice.
Essentially doubling up the revenue and that's bad.
And I tried left, I tried inner.
I'm sure someone is going to leave me a comment that's going to tell me exactly how to solve this.
But my “Python rookie” way of doing this is saying, all right, let's see.
Is there some way all the way out here that we can drop duplicates?
So there's something called DF, that's the DataFrame. .drop_duplicates.
And I say just look at the customer column and I can either keep the first or the last.
So like right here it's keeping the last.
So imagine if you're using XLOOKUP search from the bottom. Oor you can say first to get the first one.
Which is, of course, is what VLOOKUP has done forever for us.
So then that means, that in our lookup table we're going to essentially make sure that we get rid of any duplicates by pasting. So the lookup table is H3 to I77.
Headers equals true. And then drop duplicates in the customer.
And keep just the first.
What we should see here is then like accept this, Access Analytic will appear just once and it will appear with Consulting like that. Now, is this the right thing?
Well, as an Excel person.
On behalf of the 750 million people using Excel on the planet, yeah, this is the right thing to do.
The Python people who are watching this will tell me all kinds of good reasons why they should have duplicated that revenue.
And overstated the numbers to the Securities and Exchange Commission.
But as an Excel person, this is clearly the right thing.
What if we have two keys? So here, sorry, the dataset changed.
I now have Customer and Region. And here, Fintega Financial Modeling.
(Hey Dan). They have sales in central and east.
And down here it's a different sales rep depending on whether it's central or east.
So I got to do basically a two-way lookup and this is pretty cool.
It works great. The left_on, square brackets.
And then the list of fields, Region and Customer.
And then the right_on Region and Account and it works.
We still get the extra Account there, which I could drop from the whole thing so it works.
So right here, just trying to do the equivalent of a VLOOKUP or an XLOOKUP using Python, this pd.merge is pretty cool.
If the fields happen to be in common, it just kind of figures it out.
This example, that's just really short. As an Excel person, this one makes me nervous.
Because I'm not telling them what the key is.
Someone changes this heading to something else, it's all going to stop working. Yeah, right now this all makes me nervous.
I'm like, oh, what if someone changes this from Customer to Account?
Well, it would've broken my other Python anyway. Maybe I should just get used to this.
I don't know. So there you go.
How to do XLOOKUP or VLOOKUP essentially joining tables in Python.
Someone on the YouTube channel. Actually, several people on the YouTube channel.
For these last several videos was like, well, why would you do this when you could have just done it in Excel?
And I'm hoping that I get to the point where I know enough Python. That I'm going to build this beautiful mansion.
Where I do this and I do that, I do this, I do that.
And somewhere along the line.
I need to do a lookup and I don't want to have to switch back to Excel just to do the lookup. So these are all building blocks.
And hopefully, once we have enough building blocks, we'll be able to put all these together into something amazing. I want to thank you for stopping by.
We'll see you next time for another netcast with MrExcel.
If you like these videos, please down below, Like, Subscribe and Ring the Bell.
Feel free to post any questions or comments down in the comments below.
A brief overview.
We're going to start with just a simple VLOOKUP, and then something crazy, not specifying what the key field is.
What to do with the customer is missing from the lookup table.
Essentially the IFERROR equivalent.
Limiting which fields are returned, which is something we don't have to worry about with VLOOKUP.
What if the headings don't match? What if a customer is duplicated?
This one is going to throw us for a loop. Lookup on two fields, and then some thoughts.
Now, before I get started, let's just talk about the comment indicator.
So, this hash allows you to comment a line, explains what the next line is.
But what I've done here a lot in this video is I put a line of code that demonstrates a problem.
But I've already figured out what the solution is.
So, right here in the middle of the line, I can put another hash and then the solution's out here. That doesn't get run because it's after the hash.
So, that allows me to show the problem.
And then I'll get rid of the hash and pull the solution in, and hopefully it will be faster. Let's go.
So, we have a large data set over here on the left-hand side and a small lookup table.
I want to get the sector field from here and add it to there.
To get into Python Ctrl+Alt+Shift+P.
This is one, you see that right now it's returning a Python object, but I want to see the results so I'm going to do Ctrl+Alt+Shift+M, to convert that to return as Excel values.
First off, define the data frame is equal to our large data set here.
Enter to go to the next line. LT for Lookup Table.
Next line. I'll go with MJ, My Join.
So, we're essentially doing a database join here. It is weird, it's not just merge, it's pd.merge.
And we give it the left table. So DF is the table on the left.
The lookup table is the lookup table. What column is in common between them?
So that's on equals customer.
And then the Join type. This is kind of like Power Query.
Choices are Left, Right, Outer, Inner and Cross. I'll put a graphic up there on the screen.
I'm going to use Left and then Ctrl+Enter. All right, look at that.
So here's our original fields.
Product, Date, Customer, Quantity, Revenue, Profit in the same order. Kept the same sequence.
Added the new field from the lookup table - Sector.
That's beautiful. That's great.
In fact, because this is called Customer and this is called Customer, you don't even have to specify the on, it'll just look for the field in common.
That's great. All right, now some nuances here.
What happens if a customer is missing from the lookup table?
So I added a new customer here, AAAMissing, and you see that we get this #HUM! error.
To get rid of those, so the equivalent of IFERROR or IFNA is this .fillna.
And we're going to fill the NAs with just quote, quote.
So control, enter there and we'll just get blanks.
Or you could put it in the word missing or anything like that. Now, the thing that's kind of weird here.
Is they're going to give me all the fields from the lookup table that aren't in the original dataset.
So like here my lookup table in real life, you got a lookup table that has fields that you don't actually need.
So what happens?
They match it on Customer because that's the field in common.
But then they also give me Sector, Country, and Account Number because those are all in the lookup table.
So what we can do is when we define the lookup table.
I can remove columns right there.
So it's two left square brackets, the columns I want to keep, and then two right square brackets. And that should get us just Sector over here.
Beautiful. All right.
That's great. That works.
Now, what if we have fields that aren't the same heading?
This happens to me all the time.
The data comes down from the mainframe, it's called Customer.
And in my lookup table it's called Account or something like that.
So in the original example, we use the On parameter.
But there's also parameters for a left_ on, there's an underscore.
And right_on, right on!
So you can specify the left field uses Customer and the right field uses Account. But then what's weird to me.
To me I wonder what the heck Python is thinking.
Because I basically told you that Customer and Account are the same thing. And then they added Account in.
So after I get that whole thing. I can use this great method called Drop.
And the axis, if you want to drop rows, that's the axis zero.
To drop columns, axis equals one.
So there we have the Drop to remove a column that we don't want from the result. Control, enter and we get rid of that account.
All right, that's good. This one.
Ohhh, Yeah. Okay.
See, this is our weakness coming at this as Excel people.
Because as Excel people we know that if we do a VLOOKUP and there's a duplicate here.
It's just going to give us the first one.
In XLOOKUP, we can say give us the first one or the last one, but it's never going to duplicate things.
Power Query, of course, will. So here I'm just being obstinate.
I put Access Analytic in there twice. And you see that it shows up twice.
Essentially doubling up the revenue and that's bad.
And I tried left, I tried inner.
I'm sure someone is going to leave me a comment that's going to tell me exactly how to solve this.
But my “Python rookie” way of doing this is saying, all right, let's see.
Is there some way all the way out here that we can drop duplicates?
So there's something called DF, that's the DataFrame. .drop_duplicates.
And I say just look at the customer column and I can either keep the first or the last.
So like right here it's keeping the last.
So imagine if you're using XLOOKUP search from the bottom. Oor you can say first to get the first one.
Which is, of course, is what VLOOKUP has done forever for us.
So then that means, that in our lookup table we're going to essentially make sure that we get rid of any duplicates by pasting. So the lookup table is H3 to I77.
Headers equals true. And then drop duplicates in the customer.
And keep just the first.
What we should see here is then like accept this, Access Analytic will appear just once and it will appear with Consulting like that. Now, is this the right thing?
Well, as an Excel person.
On behalf of the 750 million people using Excel on the planet, yeah, this is the right thing to do.
The Python people who are watching this will tell me all kinds of good reasons why they should have duplicated that revenue.
And overstated the numbers to the Securities and Exchange Commission.
But as an Excel person, this is clearly the right thing.
What if we have two keys? So here, sorry, the dataset changed.
I now have Customer and Region. And here, Fintega Financial Modeling.
(Hey Dan). They have sales in central and east.
And down here it's a different sales rep depending on whether it's central or east.
So I got to do basically a two-way lookup and this is pretty cool.
It works great. The left_on, square brackets.
And then the list of fields, Region and Customer.
And then the right_on Region and Account and it works.
We still get the extra Account there, which I could drop from the whole thing so it works.
So right here, just trying to do the equivalent of a VLOOKUP or an XLOOKUP using Python, this pd.merge is pretty cool.
If the fields happen to be in common, it just kind of figures it out.
This example, that's just really short. As an Excel person, this one makes me nervous.
Because I'm not telling them what the key is.
Someone changes this heading to something else, it's all going to stop working. Yeah, right now this all makes me nervous.
I'm like, oh, what if someone changes this from Customer to Account?
Well, it would've broken my other Python anyway. Maybe I should just get used to this.
I don't know. So there you go.
How to do XLOOKUP or VLOOKUP essentially joining tables in Python.
Someone on the YouTube channel. Actually, several people on the YouTube channel.
For these last several videos was like, well, why would you do this when you could have just done it in Excel?
And I'm hoping that I get to the point where I know enough Python. That I'm going to build this beautiful mansion.
Where I do this and I do that, I do this, I do that.
And somewhere along the line.
I need to do a lookup and I don't want to have to switch back to Excel just to do the lookup. So these are all building blocks.
And hopefully, once we have enough building blocks, we'll be able to put all these together into something amazing. I want to thank you for stopping by.
We'll see you next time for another netcast with MrExcel.
If you like these videos, please down below, Like, Subscribe and Ring the Bell.
Feel free to post any questions or comments down in the comments below.