Microsoft Excel Tutorial: Learning the Building Blocks for RegEx in Excel
Learn the basic building blocks of RegEx in Excel.
To download the RegEx Cheat Sheet and workbook from today: Excel - RegEx Building Blocks - 2643 Sample Files - MrExcel Publishing
Welcome back to MrExcel's YouTube channel! In this video, we'll dive into the building blocks for mastering Regular Expressions (RegEx) and their integration in Excel. If you find this video helpful, don’t forget to click the Like button below – it helps us reach more viewers. We've got over 2,500 videos on various Excel topics, so there's plenty to explore!
Our focus today is on RegEx, specifically the new functions in Excel: RegExTest, RegExExtract, and RegExReplace. I’ll break down these functions step-by-step, making it easy for you to grasp the concepts. And remember, you can always visit our site at mrx.cl/gpt to type in any questions you have. You'll get a summary of relevant videos where I've covered those topics extensively.
This video is designed to get you comfortable with RegEx, even if you're a complete beginner. I’ll walk you through examples like matching specific sequences of letters, using wildcards, and understanding special characters. By the end, you’ll have a solid foundation in using RegEx within Excel. Don’t forget to download the tip card below, summarizing everything we discuss today.
One of the key points we’ll cover is how different characters and sequences can be matched using RegEx patterns. For instance, how to specify that a cell starts with a certain string or contains a specific character set. These examples will demonstrate the versatility and power of RegEx in data manipulation and search tasks.
Lastly, I’ll share some advanced techniques, like using repetition operators and handling white spaces, including tricky non-breaking spaces. This will prepare you to tackle more complex text processing tasks in Excel. As always, if you have any questions, leave them in the comments, and I'll be sure to address them.
Thanks for watching! If you enjoyed this video, please Like, Subscribe, and Ring the Bell to stay updated with all my latest Excel tips and tutorials. See you next time for another exciting video from MrExcel!
Buy Bill Jelen's latest Excel book: MrExcel 2024 Igniting Excel
Table of Contents
(00:00) Intro to RegEx Building Blocks
(00:10) MrExcel GPT to answer your Excel questions
(00:30) Download RegEx Tip Card
(00:41) Building Blocks of RegEx
(01:13) RegEx Basics
(02:15) RegEx Starts With or Ends With
(02:35) RegEx Two Meanings of Tilde
(03:52) RegEx Wildcards
(05:48) More RegEx Wildcards
(08:05) Repetition in RegEx
(12:07) Whitespace in RegEx
(14:02) Converting Non-Breaking Space to Space Using RegEx
(16:24) Returning part of a match in REGEXEXTRACT
(18:50) Transposing multiple spilled results
(19:30) Download the Workbook
(19:46) Like, Subscribe & Ring the Bell
This video answers these common search terms:
Excel RegEx tutorial
How to use Regular Expressions in Excel
Excel RegEx functions explained
RegExTest Excel function guide
Excel RegExExtract examples
Advanced Excel RegEx techniques
RegExReplace in Excel tutorial
Learning RegEx for Excel beginners
Excel text manipulation with RegEx
MrExcel RegEx basics
Learn the basic building blocks of RegEx in Excel.
To download the RegEx Cheat Sheet and workbook from today: Excel - RegEx Building Blocks - 2643 Sample Files - MrExcel Publishing
Welcome back to MrExcel's YouTube channel! In this video, we'll dive into the building blocks for mastering Regular Expressions (RegEx) and their integration in Excel. If you find this video helpful, don’t forget to click the Like button below – it helps us reach more viewers. We've got over 2,500 videos on various Excel topics, so there's plenty to explore!
Our focus today is on RegEx, specifically the new functions in Excel: RegExTest, RegExExtract, and RegExReplace. I’ll break down these functions step-by-step, making it easy for you to grasp the concepts. And remember, you can always visit our site at mrx.cl/gpt to type in any questions you have. You'll get a summary of relevant videos where I've covered those topics extensively.
This video is designed to get you comfortable with RegEx, even if you're a complete beginner. I’ll walk you through examples like matching specific sequences of letters, using wildcards, and understanding special characters. By the end, you’ll have a solid foundation in using RegEx within Excel. Don’t forget to download the tip card below, summarizing everything we discuss today.
One of the key points we’ll cover is how different characters and sequences can be matched using RegEx patterns. For instance, how to specify that a cell starts with a certain string or contains a specific character set. These examples will demonstrate the versatility and power of RegEx in data manipulation and search tasks.
Lastly, I’ll share some advanced techniques, like using repetition operators and handling white spaces, including tricky non-breaking spaces. This will prepare you to tackle more complex text processing tasks in Excel. As always, if you have any questions, leave them in the comments, and I'll be sure to address them.
Thanks for watching! If you enjoyed this video, please Like, Subscribe, and Ring the Bell to stay updated with all my latest Excel tips and tutorials. See you next time for another exciting video from MrExcel!
Buy Bill Jelen's latest Excel book: MrExcel 2024 Igniting Excel
Table of Contents
(00:00) Intro to RegEx Building Blocks
(00:10) MrExcel GPT to answer your Excel questions
(00:30) Download RegEx Tip Card
(00:41) Building Blocks of RegEx
(01:13) RegEx Basics
(02:15) RegEx Starts With or Ends With
(02:35) RegEx Two Meanings of Tilde
(03:52) RegEx Wildcards
(05:48) More RegEx Wildcards
(08:05) Repetition in RegEx
(12:07) Whitespace in RegEx
(14:02) Converting Non-Breaking Space to Space Using RegEx
(16:24) Returning part of a match in REGEXEXTRACT
(18:50) Transposing multiple spilled results
(19:30) Download the Workbook
(19:46) Like, Subscribe & Ring the Bell
This video answers these common search terms:
Excel RegEx tutorial
How to use Regular Expressions in Excel
Excel RegEx functions explained
RegExTest Excel function guide
Excel RegExExtract examples
Advanced Excel RegEx techniques
RegExReplace in Excel tutorial
Learning RegEx for Excel beginners
Excel text manipulation with RegEx
MrExcel RegEx basics
Transcript of the video:
Just down below the video, if you click Like, that will make sure that YouTube shows this video to more people.
Thanks. The building blocks for Learning RegEx and Excel.
There's over 2,500 videos here at MrExcel. Mrx.cl/gpt come here, type your question.
“Repeat a pivot table for each customer.” It's going to give you a short summary of the videos. And then it's going to give you links to the videos where I've talked about it. Every word that I've spoken on YouTube in over 15 years. MrX.CL slash gpt, give it a try.
After watching this video, you will have all the building blocks you need to put together RegEx.
Make sure to download this tip card, two pages, with a summary of everything we are going to talk about today. Okay, in the previous video you talked about these three new functions that came to Excel, RegExTest, RegExExtract, RegExreplace.
And I had mentioned that I had a RegEx 101, but that I'd do it in another video.
So this is that video. This hopefully will just get you up to speed.
I have to tell you, in all the years up until the point that it was in Excel, I was horrible at RegEx. I would always just go ask a friend how to do it.
But after studying this, it's really not that hard.
It's pretty simple to understand. Okay, so let's just start off, put some letters in the pattern. D, E, F.
That's saying that I want the letters D, E, F in sequence.
Alright? But there might be something where we want either the letter D or E or F. And so that's in square brackets.
This is a letter group and that will match if there's a, D, E, or F.
But sometimes you don't want to have to type the whole thing.
D to H will give us one letter from the letters D through H.
So here's a whole bunch of different examples. We're looking for D, E, F, and the only one that has those three letters exactly is A, B, C, D, E, F.
None of the others match. But if we ask for square bracket, D, E, F, then that's anything that has a D anywhere in it. This one, this one, this all match or a wider variety D through H picks up the G, G, G and the H, H, H.
So square brackets means one of the items in this group and you're allowed to have a dash in the middle. Alright, next up, what if you want to start with a particular bit of text? So the cell has to start with the put a caret before. Caret, t, h, e, we'll start with “the”.
Or if it has to end with something, end dollar sign will end with end.
Right now, just right up front we have to talk about this super annoying thing that the caret, it has two different meanings. This has happened before in Excel, like the F4 key puts dollar signs in if you're in edit mode. Or repeats the last command If you're not in edit mode, it has two jobs. Well, here in RegEx, the caret inside of a letter group means not X, Y, or Z. So in this case, this caret is saying it has to start with something that's not A through M. A little confusing that the caret has two different meanings, but you'll get used to it. So if we're looking for something that starts with the, there's only one. If we're looking at something that ends with end, it's this one or that one. If we're looking for something that contains a character that's not X, y, or Z. Alright?
So this one does not contain a character that's not X, Y, Z, but everything else just about contains characters that aren't X, y, Z. This one is actually far wider.
It means that it contains a character that's A through Z capital, A through W, lowercase, any of the digits, zero to nine, any of the special characters.
Yeah. And then this is one that starts with something that's not A through M. So a lot of these start with something that's not A through M. Even the I, that's a capital I.
And we're asking for lowercase A through M. Alright, so that starts with and ends with, right?
This is all pretty easy so far, right? Except for the wild cards.
Wildcard. A wildcard: a period is a wildcard for any single character, anything. So what if you need to look for an actual period?
Then you put a backslash period. So the backslash is kind of a way to escape, to say, hey, RegEx is using this period to mean something else.
A wild card in this case, put a backslash before to say no, we actually want a real period.
There's a lot of special characters, okay? So in this case, zero to nine is looking for any single digit, but there's a beautiful shortcut for that.
Backslash D is also the shortcut for any single digit.
All right, now let's talk about how to repeat. In this case, the period is looking for any character. And then the three is saying we have three of that character. Or we want at least two, but up to three.
So put a comma in between them. So we're looking for a specific repetition.
Alright? So here any character that period means any character. All of these are going to match except for the blank cell, there's no character there. But if we put a backslash period, well then only the thing that has the period is going to match. Looking for anything with a digit anywhere in it.
So we get those two and all of these have digits, but it's shorter.
Just put backslash D, it'll be the exact same answer here.
These are identical. Identical column F and G.
Alright? This is something that has at least three characters. So A has only one character doesn't match this one. The n has eight characters total, including the space and the period, but as long as it's more than three, we get it.
These two and these two only have two characters. So they don't match here, but they do match here where we're allowing two or three characters. Alright?
So that's how you specify that we're looking for a certain number of things.
Alright? So remember this wild card is for any single character or back slash D is for any single digit. We're going to repeat those here under wild card.
So we start out with those two. Period is a wild card for any single character.
Backslash D is any digit. This is a great way if you want lowercase A to Z or uppercase A to Z, it looks like this. And then they have something for digits, lowercase A to Z, uppercase A to Z, and that's W, lowercase W is the code for any of those.
It seems like that'd be really useful. We use that one a lot.
Back slash capital W means the reverse of that. It's not A to Z or lowercase A to Z or zero to nine. This happens a lot where you take the lowercase and you capitalize it and it makes it into a Not. Okay?
Now we're going to talk about spaces here. Actually two tabs from now backslash s is a space or any kind of a white space. There's all kinds of spaces that you might encounter, like a character 160 space and that'll be handled.
And then not a space, not white space. We'll look for something that's a non-space.
So here, if we're looking for a digit, that's a match.
If we're looking for something that's a letter while the one is not a match, the exclamation point, not a match. And then the only thing here that doesn't fall into the WW is a great catchall, A to Z, uppercase, lowercase, and digits.
Special characters like the exclamation point or anything shifted along the top row of your keyboard will be that. And then not a W, right?
So W is A to Z, upper or lower zero to nine. Well, what contains not a W?
Well the exclamation point, certainly it does. But also here there's a space.
So a space Z or A space capital Z, those fall into the not a W.
Okay, so now check out these two. This one is saying that we have an A followed by a space followed by a Z. So right there, that one matches because it's K sensitive, it's not going to see the capital A, space Z.
And in this case it's with the capitals slash capitals says something that's not a space.
In other words, we start with an A end with a Z. And in the middle we have a letter so that A and BZ there will work. Next up, repetition.
Alright, so we know back slash D is any digits from zero to nine, a single digit from zero to nine. What if we need to look for multiple digits, like 1 23 main or 1234 Peyton place? Alright, so the first thing they have, and I hate this one back D asterisk says there's going to be some number of digits.
We don't know how many digits. There might be two digits, there might be seven digits. But it's also possible with the asterisk that there might be zero digits, which is really weird to me.
Why are we saying, hey go, we need a digit here, but in fact there might be no digits at all.
So back slash D is getting everything here that has a digit, that's great, but back slash d asterisk is also returning things that have zero digits at all.
Right? Now in my RegEx journey, this one seems pretty useless. I'm sure there'll be use cases later, but what we really need I think is D plus that says there's a digit at least one and maybe more digits in a row.
So like a one, two, a, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, A, one Z.
But when there's no digits and then it's not going to show up as true for that.
Alright? The other one that's interesting is there can be a digit or it's okay to not have a digit, but we can't have two, three, or four digits.
So D question mark is specifically zero or one digit.
They say that the question mark means that it's optional.
There could or could not be a digit there. So the way that I built this is we need a letter from A to Z, lowercase, and then backslash the question mark.
And then another letter from A to Z select. Looking down here, this one fits the pattern.
A one Z is lowercase letter A, one and a Z. And then even here, lowercase letter, no digit and another lowercase letter. Those both work.
But here where we have a and then three digits in a row and then the Z, that one does not work. See, and it's also funny, all of these are marked as correct because there's no digit. So we have two lowercase letters and no digit.
Alright? If we know specifically that we need three digits slash D and then curly braces three, or you're allowed to put a range in here, like anywhere from two to four digits. So curly braces two comma four is between two and four digits like that. So here slash D three, it's only the ones that have three digits in a row, but also four digits and five digits.
These have 1, 2, 3, 1, 2, 3. It's definitely there.
So it finds the pattern. Even a 9 9, 9 Z with three digits will fire there between two and four digits. That picks up a1 two.
That wasn't true before. And again, all the others, even things with five digits because it's saying, Hey, is there between two and four digits?
Yes. And then after that there happens to be a five.
All right, this one is interesting to me. So if we have something like sounds like bur it's cold, that could be spelled with three Rs, four Rs, or five Rs, but not BR, and not even BRR in my rule set. So we're looking for B and then anywhere from three to five Rs followed by an optional space. So with that pattern, the BR and BRR are not true, but the ber or ber burr is coming up as true. Now the thing that's interesting to me is that three to five refers to just the one character before it.
What if we wanted to have the word repeat the entire word?
So then we're going to put the whole thing in. So now we need BRR space anywhere from two to three times, and that will be true in that case. So the parentheses creating a group that the repetition direction will refer to that entire group.
I was super interested in how they're going to deal with white space.
Alright, so if you're just looking for a space backs slash D is any digit, the space back slash D. So we're looking for digit space digit.
Now, one hassle with this is, so that matches nine space one, which is what I was thinking of.
But it also matches nine, nine space, nine nine because in the middle of that whole thing is a digit, a space and a digit. So if we want to say it has to start with the digit, then a space and then end with the digit, you need that caret at the beginning that says it starts with dollar sign at the end says it ends with, and that'll make sure that that one matches, but this one does not match. Okay?
And then other kinds of whitespace back slash t is a tab slash N is what you get when you press al to enter. That's also known as a character.
10 VB is a VBF. Some people call it a line feed back slash R is a carriage return, that's a character 13 or a V, BCR and VBA.
And then back slash s is either a space or any of these three get counted as white space.
Basically think of it as some sort of white space. And then backslash, not S is saying that it would not be a space, right? So in my codes across the top here, this is backslash backs slash s, and this has an axial space there.
And I also found when I was looking to see how character one 60 would be handled that backslash H in some implementations of RegEx will give you more spaces.
So here I have word alt enter wrap, and it gets caught by the backslash N, but it also gets caught by the backslash S because remember backslash S is looking for tabs, line feed, carriage return or spaces. Okay?
So non breaking spaces, this is one that just never gets handled well by Excel.
And if we're pulling data off a website, we're just scraping data off a website, there's going to be character one 60 spaces there. So right here I have evilly used the formula to put in a number one and then three character, one sixties, and then the number nine.
So does that get seen as a digit space digit? No.
Does it get seen as starting with a digit, a space and ending with a digit?
No. Does it get seen as any of these?
No. But it does thankfully get seen as slash s and slash h. Now out in Stack Overflow, they actually said that you'd have to use back slash H for this. And it seems like Excel is being better with this.
They're understanding that when we say back slash s, they're treating the character one 60 as a space, it sees it as a space. It doesn't see it when we just type a space, but if you use backslash S, you're going to get all of those spaces.
Alright, so here's the problem. So in this one, and then character 1 63 times nine when I try and use trim, trim in Excel will get rid of the interior repeated spaces.
So that should change it to one Space Nine, but it doesn't see the character one 60 as a space, so it doesn't do anything. So down here I'm going to use RegEx to replace, let's put that formula next to it. So I'm going to replace the backslash S with a real space. So that converts all those evil character one 60 spaces to regular spaces right there. That's worth the price of admission.
That formula. We should make that formula red because that's a formula I'm going to come back to again and again and again.
And then once I've converted the evil character, one 60 spaces, the regular spaces, then Trim does the right thing. It's funny, the character one 60 is the one that we run into all the time because the web is popular, but there's all kinds of weird spaces out there and it very nicely, a space character only matches a character 32.
But in Excel anyway, slash s slash h are both programmed to find all of these weird spaces, M spaces and spaces, all kinds of weird things. We never run into any of these except for character one 60. But maybe if you're, I dunno, I don't know why you would get any of these, but in case you do great news, they're handled.
Alright, now look, there's one last thing here and this is really, really powerful.
Alright, this is matching and then selecting a group from the selection.
So here we're looking for any character plus means any number of any character followed by specifically a period. And then P, D, F and ends there, right?
There's nothing after P, D, F. So this will get all of these files that are P, D, F files, but not the Excel files, right? Good enough.
Let's modify that. Now that I know that it's a pdf d, I don't need to see the PDF anymore. So I modify that pattern to put a parentheses there and that says the only part that I want is all of the characters up to the period.
I don't want the period PD, F get me everything before that.
So check out that pattern with the group. That's called a group.
I'm selecting a group. It will return everything up to the PDF.
Now, we had to modify this formula, the regex extract.
There is an argument out here called return mode, and this is called a capture group of the first match. So we have to change that argument that really was just defaulting to zero to be a two to make that work.
It's pretty cool. So then here I want to find everything that's an Excel file. So either X-L-S-X-X-L-S-M, XLSB, I'm ignoring the templates for right now and notice that I'm allowing either XMB and the question mark.
That question mark is saying, Hey, there could be zero of those or one of those.
So let me capture XLS and it has to end with that. So all of these get us the file name here.
Oh, except for this one. What's up with that?
How? I don't know how I got the file extension to be capitals scatter, other graph template. That must be ancient.
I have no idea how that's there, but it's there so it's not being seen.
So the modification here is to come out and do a case insensitive match.
So the case sensitivity allows me to handle that. That's an important part.
And then finally, all right, check out this awesome little bit of code.
I took this one and I said I want the file name. So that's the parentheses.
And then, hey, since I don't know what the file type is, go ahead.
I don't want the dot, but I want everything else after that.
So give me all of the stuff that's either XLS or x Ls X or x ls M or XLS, B and return that as a second group. All.
Now for this one to work or this don't want to work, and I mentioned this at the end of yesterday's video, I had an example. We're saying that we want the two.
So we're getting all of the groups. And when I do this, it's going to put the first thing here, the file name in this cell, and then it's going to put the extension underneath it. It's coming back as a vertical array.
So I just have to transpose that to make it be a horizontal array.
All right? But there I'm defining multiple groups and I'm getting them all. All right, so there you go.
There's the RegEx 1 0 1, I think with this workbook.
And by the way, you can download this workbook. There's enough examples here to get us through a lot of the types of pattern matching that we might need to do.
So there you go. Hey, thanks for stopping by.
We'll see you next time for another net cast from MrExcel.
If you like These videos, please down below, Like, Subscribe, and Ring the Bell. Feel free to post any questions or comments down in the comments below.
Thanks. The building blocks for Learning RegEx and Excel.
There's over 2,500 videos here at MrExcel. Mrx.cl/gpt come here, type your question.
“Repeat a pivot table for each customer.” It's going to give you a short summary of the videos. And then it's going to give you links to the videos where I've talked about it. Every word that I've spoken on YouTube in over 15 years. MrX.CL slash gpt, give it a try.
After watching this video, you will have all the building blocks you need to put together RegEx.
Make sure to download this tip card, two pages, with a summary of everything we are going to talk about today. Okay, in the previous video you talked about these three new functions that came to Excel, RegExTest, RegExExtract, RegExreplace.
And I had mentioned that I had a RegEx 101, but that I'd do it in another video.
So this is that video. This hopefully will just get you up to speed.
I have to tell you, in all the years up until the point that it was in Excel, I was horrible at RegEx. I would always just go ask a friend how to do it.
But after studying this, it's really not that hard.
It's pretty simple to understand. Okay, so let's just start off, put some letters in the pattern. D, E, F.
That's saying that I want the letters D, E, F in sequence.
Alright? But there might be something where we want either the letter D or E or F. And so that's in square brackets.
This is a letter group and that will match if there's a, D, E, or F.
But sometimes you don't want to have to type the whole thing.
D to H will give us one letter from the letters D through H.
So here's a whole bunch of different examples. We're looking for D, E, F, and the only one that has those three letters exactly is A, B, C, D, E, F.
None of the others match. But if we ask for square bracket, D, E, F, then that's anything that has a D anywhere in it. This one, this one, this all match or a wider variety D through H picks up the G, G, G and the H, H, H.
So square brackets means one of the items in this group and you're allowed to have a dash in the middle. Alright, next up, what if you want to start with a particular bit of text? So the cell has to start with the put a caret before. Caret, t, h, e, we'll start with “the”.
Or if it has to end with something, end dollar sign will end with end.
Right now, just right up front we have to talk about this super annoying thing that the caret, it has two different meanings. This has happened before in Excel, like the F4 key puts dollar signs in if you're in edit mode. Or repeats the last command If you're not in edit mode, it has two jobs. Well, here in RegEx, the caret inside of a letter group means not X, Y, or Z. So in this case, this caret is saying it has to start with something that's not A through M. A little confusing that the caret has two different meanings, but you'll get used to it. So if we're looking for something that starts with the, there's only one. If we're looking at something that ends with end, it's this one or that one. If we're looking for something that contains a character that's not X, y, or Z. Alright?
So this one does not contain a character that's not X, Y, Z, but everything else just about contains characters that aren't X, y, Z. This one is actually far wider.
It means that it contains a character that's A through Z capital, A through W, lowercase, any of the digits, zero to nine, any of the special characters.
Yeah. And then this is one that starts with something that's not A through M. So a lot of these start with something that's not A through M. Even the I, that's a capital I.
And we're asking for lowercase A through M. Alright, so that starts with and ends with, right?
This is all pretty easy so far, right? Except for the wild cards.
Wildcard. A wildcard: a period is a wildcard for any single character, anything. So what if you need to look for an actual period?
Then you put a backslash period. So the backslash is kind of a way to escape, to say, hey, RegEx is using this period to mean something else.
A wild card in this case, put a backslash before to say no, we actually want a real period.
There's a lot of special characters, okay? So in this case, zero to nine is looking for any single digit, but there's a beautiful shortcut for that.
Backslash D is also the shortcut for any single digit.
All right, now let's talk about how to repeat. In this case, the period is looking for any character. And then the three is saying we have three of that character. Or we want at least two, but up to three.
So put a comma in between them. So we're looking for a specific repetition.
Alright? So here any character that period means any character. All of these are going to match except for the blank cell, there's no character there. But if we put a backslash period, well then only the thing that has the period is going to match. Looking for anything with a digit anywhere in it.
So we get those two and all of these have digits, but it's shorter.
Just put backslash D, it'll be the exact same answer here.
These are identical. Identical column F and G.
Alright? This is something that has at least three characters. So A has only one character doesn't match this one. The n has eight characters total, including the space and the period, but as long as it's more than three, we get it.
These two and these two only have two characters. So they don't match here, but they do match here where we're allowing two or three characters. Alright?
So that's how you specify that we're looking for a certain number of things.
Alright? So remember this wild card is for any single character or back slash D is for any single digit. We're going to repeat those here under wild card.
So we start out with those two. Period is a wild card for any single character.
Backslash D is any digit. This is a great way if you want lowercase A to Z or uppercase A to Z, it looks like this. And then they have something for digits, lowercase A to Z, uppercase A to Z, and that's W, lowercase W is the code for any of those.
It seems like that'd be really useful. We use that one a lot.
Back slash capital W means the reverse of that. It's not A to Z or lowercase A to Z or zero to nine. This happens a lot where you take the lowercase and you capitalize it and it makes it into a Not. Okay?
Now we're going to talk about spaces here. Actually two tabs from now backslash s is a space or any kind of a white space. There's all kinds of spaces that you might encounter, like a character 160 space and that'll be handled.
And then not a space, not white space. We'll look for something that's a non-space.
So here, if we're looking for a digit, that's a match.
If we're looking for something that's a letter while the one is not a match, the exclamation point, not a match. And then the only thing here that doesn't fall into the WW is a great catchall, A to Z, uppercase, lowercase, and digits.
Special characters like the exclamation point or anything shifted along the top row of your keyboard will be that. And then not a W, right?
So W is A to Z, upper or lower zero to nine. Well, what contains not a W?
Well the exclamation point, certainly it does. But also here there's a space.
So a space Z or A space capital Z, those fall into the not a W.
Okay, so now check out these two. This one is saying that we have an A followed by a space followed by a Z. So right there, that one matches because it's K sensitive, it's not going to see the capital A, space Z.
And in this case it's with the capitals slash capitals says something that's not a space.
In other words, we start with an A end with a Z. And in the middle we have a letter so that A and BZ there will work. Next up, repetition.
Alright, so we know back slash D is any digits from zero to nine, a single digit from zero to nine. What if we need to look for multiple digits, like 1 23 main or 1234 Peyton place? Alright, so the first thing they have, and I hate this one back D asterisk says there's going to be some number of digits.
We don't know how many digits. There might be two digits, there might be seven digits. But it's also possible with the asterisk that there might be zero digits, which is really weird to me.
Why are we saying, hey go, we need a digit here, but in fact there might be no digits at all.
So back slash D is getting everything here that has a digit, that's great, but back slash d asterisk is also returning things that have zero digits at all.
Right? Now in my RegEx journey, this one seems pretty useless. I'm sure there'll be use cases later, but what we really need I think is D plus that says there's a digit at least one and maybe more digits in a row.
So like a one, two, a, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, A, one Z.
But when there's no digits and then it's not going to show up as true for that.
Alright? The other one that's interesting is there can be a digit or it's okay to not have a digit, but we can't have two, three, or four digits.
So D question mark is specifically zero or one digit.
They say that the question mark means that it's optional.
There could or could not be a digit there. So the way that I built this is we need a letter from A to Z, lowercase, and then backslash the question mark.
And then another letter from A to Z select. Looking down here, this one fits the pattern.
A one Z is lowercase letter A, one and a Z. And then even here, lowercase letter, no digit and another lowercase letter. Those both work.
But here where we have a and then three digits in a row and then the Z, that one does not work. See, and it's also funny, all of these are marked as correct because there's no digit. So we have two lowercase letters and no digit.
Alright? If we know specifically that we need three digits slash D and then curly braces three, or you're allowed to put a range in here, like anywhere from two to four digits. So curly braces two comma four is between two and four digits like that. So here slash D three, it's only the ones that have three digits in a row, but also four digits and five digits.
These have 1, 2, 3, 1, 2, 3. It's definitely there.
So it finds the pattern. Even a 9 9, 9 Z with three digits will fire there between two and four digits. That picks up a1 two.
That wasn't true before. And again, all the others, even things with five digits because it's saying, Hey, is there between two and four digits?
Yes. And then after that there happens to be a five.
All right, this one is interesting to me. So if we have something like sounds like bur it's cold, that could be spelled with three Rs, four Rs, or five Rs, but not BR, and not even BRR in my rule set. So we're looking for B and then anywhere from three to five Rs followed by an optional space. So with that pattern, the BR and BRR are not true, but the ber or ber burr is coming up as true. Now the thing that's interesting to me is that three to five refers to just the one character before it.
What if we wanted to have the word repeat the entire word?
So then we're going to put the whole thing in. So now we need BRR space anywhere from two to three times, and that will be true in that case. So the parentheses creating a group that the repetition direction will refer to that entire group.
I was super interested in how they're going to deal with white space.
Alright, so if you're just looking for a space backs slash D is any digit, the space back slash D. So we're looking for digit space digit.
Now, one hassle with this is, so that matches nine space one, which is what I was thinking of.
But it also matches nine, nine space, nine nine because in the middle of that whole thing is a digit, a space and a digit. So if we want to say it has to start with the digit, then a space and then end with the digit, you need that caret at the beginning that says it starts with dollar sign at the end says it ends with, and that'll make sure that that one matches, but this one does not match. Okay?
And then other kinds of whitespace back slash t is a tab slash N is what you get when you press al to enter. That's also known as a character.
10 VB is a VBF. Some people call it a line feed back slash R is a carriage return, that's a character 13 or a V, BCR and VBA.
And then back slash s is either a space or any of these three get counted as white space.
Basically think of it as some sort of white space. And then backslash, not S is saying that it would not be a space, right? So in my codes across the top here, this is backslash backs slash s, and this has an axial space there.
And I also found when I was looking to see how character one 60 would be handled that backslash H in some implementations of RegEx will give you more spaces.
So here I have word alt enter wrap, and it gets caught by the backslash N, but it also gets caught by the backslash S because remember backslash S is looking for tabs, line feed, carriage return or spaces. Okay?
So non breaking spaces, this is one that just never gets handled well by Excel.
And if we're pulling data off a website, we're just scraping data off a website, there's going to be character one 60 spaces there. So right here I have evilly used the formula to put in a number one and then three character, one sixties, and then the number nine.
So does that get seen as a digit space digit? No.
Does it get seen as starting with a digit, a space and ending with a digit?
No. Does it get seen as any of these?
No. But it does thankfully get seen as slash s and slash h. Now out in Stack Overflow, they actually said that you'd have to use back slash H for this. And it seems like Excel is being better with this.
They're understanding that when we say back slash s, they're treating the character one 60 as a space, it sees it as a space. It doesn't see it when we just type a space, but if you use backslash S, you're going to get all of those spaces.
Alright, so here's the problem. So in this one, and then character 1 63 times nine when I try and use trim, trim in Excel will get rid of the interior repeated spaces.
So that should change it to one Space Nine, but it doesn't see the character one 60 as a space, so it doesn't do anything. So down here I'm going to use RegEx to replace, let's put that formula next to it. So I'm going to replace the backslash S with a real space. So that converts all those evil character one 60 spaces to regular spaces right there. That's worth the price of admission.
That formula. We should make that formula red because that's a formula I'm going to come back to again and again and again.
And then once I've converted the evil character, one 60 spaces, the regular spaces, then Trim does the right thing. It's funny, the character one 60 is the one that we run into all the time because the web is popular, but there's all kinds of weird spaces out there and it very nicely, a space character only matches a character 32.
But in Excel anyway, slash s slash h are both programmed to find all of these weird spaces, M spaces and spaces, all kinds of weird things. We never run into any of these except for character one 60. But maybe if you're, I dunno, I don't know why you would get any of these, but in case you do great news, they're handled.
Alright, now look, there's one last thing here and this is really, really powerful.
Alright, this is matching and then selecting a group from the selection.
So here we're looking for any character plus means any number of any character followed by specifically a period. And then P, D, F and ends there, right?
There's nothing after P, D, F. So this will get all of these files that are P, D, F files, but not the Excel files, right? Good enough.
Let's modify that. Now that I know that it's a pdf d, I don't need to see the PDF anymore. So I modify that pattern to put a parentheses there and that says the only part that I want is all of the characters up to the period.
I don't want the period PD, F get me everything before that.
So check out that pattern with the group. That's called a group.
I'm selecting a group. It will return everything up to the PDF.
Now, we had to modify this formula, the regex extract.
There is an argument out here called return mode, and this is called a capture group of the first match. So we have to change that argument that really was just defaulting to zero to be a two to make that work.
It's pretty cool. So then here I want to find everything that's an Excel file. So either X-L-S-X-X-L-S-M, XLSB, I'm ignoring the templates for right now and notice that I'm allowing either XMB and the question mark.
That question mark is saying, Hey, there could be zero of those or one of those.
So let me capture XLS and it has to end with that. So all of these get us the file name here.
Oh, except for this one. What's up with that?
How? I don't know how I got the file extension to be capitals scatter, other graph template. That must be ancient.
I have no idea how that's there, but it's there so it's not being seen.
So the modification here is to come out and do a case insensitive match.
So the case sensitivity allows me to handle that. That's an important part.
And then finally, all right, check out this awesome little bit of code.
I took this one and I said I want the file name. So that's the parentheses.
And then, hey, since I don't know what the file type is, go ahead.
I don't want the dot, but I want everything else after that.
So give me all of the stuff that's either XLS or x Ls X or x ls M or XLS, B and return that as a second group. All.
Now for this one to work or this don't want to work, and I mentioned this at the end of yesterday's video, I had an example. We're saying that we want the two.
So we're getting all of the groups. And when I do this, it's going to put the first thing here, the file name in this cell, and then it's going to put the extension underneath it. It's coming back as a vertical array.
So I just have to transpose that to make it be a horizontal array.
All right? But there I'm defining multiple groups and I'm getting them all. All right, so there you go.
There's the RegEx 1 0 1, I think with this workbook.
And by the way, you can download this workbook. There's enough examples here to get us through a lot of the types of pattern matching that we might need to do.
So there you go. Hey, thanks for stopping by.
We'll see you next time for another net cast from MrExcel.
If you like These videos, please down below, Like, Subscribe, and Ring the Bell. Feel free to post any questions or comments down in the comments below.