Excel VLOOKUP Fails At Strange Hyphen - 2600

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on May 22, 2023.
Microsoft Excel Tutorial - Deep Dive on diagnosing VLOOKUP errors.

To download the workbook: Excel VLOOKUP Fails At Strange Hyphen - 2600 Sample Files - MrExcel Publishing

This is an advanced look at troubleshooting hard VLOOKUP errors. After using TRIM, CLEAN, and making sure there aren't numbers stored as text, then I use this method to look at the ASCII code, character by character to figure out why the two values do not match.

But today, the vendor part number is reporting that a hyphen is ASCII CODE 63 instead of 45. What is up with this?

I had to modify my workflow to add UNICODE functions in order to discover that the hyphen - which the CODE function reports as ASCII 63 is really Unicode 8208. Why did the vendor decide to use Unicode 8208?

The solution is to SUBSTITUTE( , UNICHAR(8208) with "-".

Also in this video:
Which characters are actually removed by the Excel CLEAN() function.
This might be our 2500th episode.

Table of Contents
(0:00) Welcome
(0:21) VLOOKUP fails
(0:42) Character-by-Character Compare
(1:39) Explaining ASCII
(1:58) Using CODE() or CHAR() in Excel
(3:03) The hyphens are different
(3:34) Explaining UNICODE
(4:05) Klingon is not in Unicode
(4:43) Using =UNICODE() in Excel
(5:03) Unicode 8208 is a hyphen
(5:18) 28 dashes in Unicode
(5:40) Using SUBSTITUTE to replace Unicode 8208 with a Hyphen
(6:20) What does CLEAN() function really remove
(7:35) Open Questions
(7:55) Why does Excel CODE return ASCII 63 instead of a question mark
(8:20) Thanks for watching 2500 episodes!
(9:02) Like, Subscribe, Ring the Bell
maxresdefault.jpg


Transcript of the video:
Well, I get to learn a couple of things today. This video could be called...
When You Get An #N/A from VLOOKUP, How to Triage that Character by Character.
It could also be called, VLOOKUP Fails Due to a Strange Hyphen. Or, When the Excel CODE Function Returns 63, That's a Sign That You Might Need To Use the UNICODE Function Instead. All right, today's question from Michelle.
Now she sent a much bigger file, but none of her VLOOKUPs are working; and Michelle already knew how to use TRIM() and CLEAN(), and it still fails.
All right, so anytime that this happens, I'm like, "Oh, send me the file". All right, and I go through these steps.
I know that I'm trying to figure out why 088619-000 is not matching 088619-000.
And I always go through these steps right here. I want to know the length of both of these.
Is there some hidden character there? So equal LEN on what's above us.
That one has 10. This one has 10.
Okay, so same length. Over here I'm going to put the numbers one to 10.
I'll use the SEQUENCE() function for that.
You could just type the numbers, whatever you want to do.
And we're going to look at this character by character.
So, equal MID of this text, I'm going to press F4 one, two times to lock it down to row 15. Where do I want to start?
I want to start at the one in A17.
This time, I'll press F4 three times to lock the columns down. And I want exactly one character.
Double-click to copy that down. And then I should be able to copy that over here.
And everything of course at this point is going to look the same, but it doesn't matter how it looks.
Because behind the scenes, going back to the early days of computing, there was something called the ASCII character set. Originally it was 127 characters.
I guess zero may have been one of them, probably 128 characters.
And in Excel we can use the CODE and CHARacter function.
So if you have a number like here, let's just take 65 is a capital letter A.
So equal CHAR of 65 will give you the letter A.
Or on the flip side, you can use equal CODE of that A to find out that the A is a character 65.
All right, so here we're going to use equal CODE to figure out what number is actually stored there.
And that says it's a 48, and that makes sense.
I haven't memorized the ASCII character codes, but I generally know that 65 is a capital letter A. Somewhere up in the nineties is a lowercase A.
48 is a zero. And then the punctuation is 32 to 47.
There's some more punctuation here, some more punctuation here.
So when I see that 48, that seems to make sense to me.
Double clicking, copy it down, all of those look good.
And then over here, I'm going to do the exact same thing and look at these codes.
All right, and then I want to just do a little check and see does this code match that code. And copy it down.
Check this out. Okay, so there's our problem.
On the left, the number we're looking up, it has a hyphen that's a ASCII code 45.
And on the right, the lookup table that we got from the vendor has an ASCII code 63.
Now, that doesn't make sense to me.
I'd have to go look it up but the 45 generally seems right. 63 shouldn't be a hyphen.
And when I go look, sure enough, 63 is a question mark.
And that just doesn't match. Why?
What's going on here?
The explanation is in an alternate character set called UNICODE.
While ASCII originally had 127 characters and then later 256 characters. UNICODE today has 149,186 characters.
Every symbol in every language, lots of emojis, formatting codes, all kinds of stuff is here and it's maintained by the UNICODE Consortium.
I remember reading maybe five years ago that a group of people had asked for the Klingon characters to be made part of UNICODE.
And the UNICODE Consortium refused because it wasn't a real language.
So there's some board out here deciding whether or not to add characters.
So while Excel has the CODE() and CHAR() function to deal with ASCII.
They've added the UNICODE() and UNICHAR() function to deal with codes that are UNICODEs.
So this table that I've been using my whole life to try and figure out these VLOOKUP problems.
We now need to add an extra column to this table to check for the UNICODE.
All right, so here I'm going to ask for the equal UNICODE of that character.
And here they all exactly match ASCII. But here, insert a column for UNICODE.
Copy that over.
We find out that this hyphen in the vendor part number is not a character 45 hyphen. It is a character 8208 from UNICODE.
What the hell?
Well, doing a little bit of research here, jkorpella.fi has a list of 28 different dashes in UNICODE.
And sure enough, 8208 is just a regular old hyphen.
Why did the vendor decide they had to use this one instead of a character 45?
Do they have other hyphens in their part numbers that are something different?
That is a great mystery.
So the solution for Michelle is to use equal SUBSTITUTE of in this text, we're going to look for UNICHAR 8208 and replace that with CHAR 45. Ah, that's silly.
We could actually have this on our keyboard.
Let's put in quotes a regular old hyphen like that. Copy that down.
Control-C, come over here and paste as values, Control-Shift-V, and our VLOOKUPs start to work.
Although that one I have the wrong entered from the table. All right, my bad on that.
And really, there should be a fourth title of this video about Why Has the Clean Function Not Kept Up With the Times?
So here's all the ASCII characters from zero to 255, there's the character, and then does CLEAN() get rid of that character?
Ideally, you want to have a length of zero if clean works on it.
So Data, Filter, let's look at all the ones that are zeros.
All right.
So it's cleaning these unprintable characters below 32. It cleans 129, 141, 143, 144, 157.
It doesn't do anything with a non-breaking space at 160, that causes a lot of VLOOKUPs to fail.
There's a lot of strange characters that don't get cleaned.
And I when to do the same thing with UNICODE, so here's a sequence of 149,000, there's the UNICHAR and then the length of the CLEAN.
Let's take a look at this and see if they're cleaning any of those UNICODE characters. Okay, look for the zeros.
So it's cleaning UNICODE one through 31, and then it's smart enough to clean UNICODE 128 to 159. Then it looks like they just gave up after that.
So CLEAN definitely is not handling the majority of the UNICODE.
Ah, it just brings up so many questions.
Why is this vendor using UNICHAR 8208 for the dashes?
Should Excel VLOOKUP see a character 32 and character 160 as a match?
Should it see a character 45 and UNICHAR 8208 as a match? And then the CODE function.
So the code function encounters UNICODE, and the developer who wrote the code for CODE.
Says, "Oh, we're going to return a question mark".
But instead of actually returning a question mark, it returns the ASCII code of a question mark, which is just too freaking subtle for me.
Shouldn't it return a #VALUE! Error.
Or something to let you know, "Hey, you're trying to get the ASCII code for this, and it's not ASCII".
Program note, this might be our 2500th episode.
When I started, I was back in the TechTV days and Episode 101 was Season One, Episode One. So the MrExcel podcast, it debuted at episode 101.
Which makes this episode, 2600, our 2500th episode.
And it's something as boring as a UNICODE hyphen. Ah, that's the way it happens.
But then I started thinking about it, there was at least 12 “emergency” episodes inserted between prerecorded episodes.
And 37 shorts that weren't numbered.
So I don't know, maybe it was episode 2551 was the anniversary. Yeah, hard to say.
Anyway, thanks to all of you who hang around and watch these.
I really appreciate it.
Well anyway, I want to thank Michelle for sending that question in, and I want to thank you for stopping by.
We'll see you next time for another NetCast from MrExcel.
If you like these videos, please down below, Like, Subscribe and Ring the Bell.
Feel free to post any questions or comments down in the comments below.
 

Forum statistics

Threads
1,221,497
Messages
6,160,151
Members
451,625
Latest member
sukhman

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top