Extract Words using Regex

Ombir

Active Member
Joined
Oct 1, 2015
Messages
433
Hi,

I've a string like "abc.txt def.txt lmn.png jkl.jpeg pvr.txt".

THis pattern "\w*(?=\.txt)" only extract file names of text files i.e abc,def,pvr.

Now I want the opposite of this. I want to extract file names that are not text i.e lmn and jkl

Please suggest the pattern to match this. If possible I would also like to extract extension names along with file name like lmn.png and jkl.jpeg for another scenario. Is it possible with Negative lookahead ?
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
Hi

Not sure I understood correctly, but try:

Code:
Sub test()
Dim regex As Object, regexMatches As Object
Dim s As String, j As Long

s = "abc.txt def.txt lmn.png jkl.jpeg pvr.txt" ' test string

Set regex = CreateObject("VBScript.RegExp")
With regex
    .Pattern = "\w+(?!\.txt)\.\w{3,4}"
    .Global = True
    Set regexMatches = .Execute(s)
End With

' display matches
For j = 0 To regexMatches.Count - 1
    MsgBox regexMatches(j)
Next j

End Sub

Remark: as you see in the pattern I assumed an extension with 3 or 4 characters.
 
Upvote 0
Thanks pgc. You understood it correctly. Its working fine with extensions. What will be the pattern if I don't want the extensions in the final result i.e only lmn and jkl.
 
Upvote 0
I tried this now \w+(?!\.txt)(?=\.\w{3,4}) and it returns file names without extensions. Do you've a better approach or its good to go ?
 
Upvote 0
Yes, that's how I'd do it.

Remark: I should have finished the expression with a (?= |$)

meaning that after the 3 or 4 characters there's a space or the end of the string

Code:
     .Pattern = "\w+(?!\.txt)\.\w{3,4}(?= |$)"

I wanted to specify an extension with 3 or 4 characters.
The way I posted before it will accept things like abc.defgXXX, with more than 4 characters in the extension (1.st error, it would accept the extension as valid) and return only the first 4 (2.nd error, it would seem everything was O.K.)
 
Last edited:
Upvote 0
Hi Pgc,

Need your help again. Javascript doesn't support Negative lookbehind. Therefor I'm using regexr.com website to explain the problem. I'm using PCRE engine on this website as it supports lookbehind.

String: Mr Ombir,Mr Ajay,Miss Sakshi,Miss Suhani

I'm looking for the pattern using Negative lookbehind to extract full name whose title is not Miss. So pattern should match Mr Ombir and Mr Ajay.

If I replace the Miss to Mr in the pattern then it should also match the opposite i.e Miss Sakshi and Miss Suhani

I've come up with this pattern but its not picking up the title.

(? < ! Miss ) \s [ \s \w ] +
<!--Miss)\s[\s\w]+"

Please help?
 
Last edited:
Upvote 0
Just able to solve it using \ w + ( ? < ! Miss ) \ s \ w + but don't understand how it works? Is it correct or can be improved ?
 
Last edited:
Upvote 0
Hi

As you know, vba does not also suppout lookbehinds.

In vba, I'd use:

Code:
Sub test()
Dim regex As Object, regexMatches As Object
Dim s As String, j As Long

s = "Mr Ombir,Mr Ajay,Miss Sakshi,Miss Suhani" ' test string

Set regex = CreateObject("VBScript.RegExp")
With regex
    .Pattern = "(?:^|,)(?!Miss)([^,]+)"
    .Global = True
    Set regexMatches = .Execute(s)
End With

' display matches
For j = 0 To regexMatches.Count - 1
    MsgBox regexMatches(j).submatches(0)
Next j

End Sub
 
Upvote 0

Forum statistics

Threads
1,224,819
Messages
6,181,153
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top