Regular Expression (RegEx) User-Defined Function (UDF) to Extract Filenames from Network Path Strings

lneidorf

Board Regular
Joined
May 20, 2004
Messages
97
Office Version
  1. 365
Platform
  1. Windows
Hi there.

I've got a weekly work task involving the extraction of filenames from 10,000+ unwieldy and inconsistently-formatted network filepaths. Adding complication is the fact that these are multivalue: the same cells contain multiple filepaths\filenames separated by semicolons. I need to extract them all and have them display in the same cell separated by semicolons.

The complexity of the data makes a front-end formula impracticable. And so I've been trying to create a User-Defined Function (UDF) in order to use RegEx to accomplish this. Unfortunately, I've not been successful.

I'm not certain whether the issue is with my RegEx format, VBA, or both. Feedback on both fronts would be appreciated! I had been using code found here (VBA Express : Multiple Apps - UDF wrappers for RegExp Find and Replace methods) that creates a UDF allowing the user to specify the RegEx pattern on the front-end as an argument of the Function itself. This has worked wonderfully for other applications, but not here. So I've tried implementing this as a new UDF, but have not had success.


Ok, so here's what I've got:

SAMPLE DATA (contained in cell A2):
\\ln2-pv-nas1\AX_CD\MILVCMH-148_160\Homedirs\jeff.smith\Replication\Documents\Board meetings\2007 Meetings\Oct 12\sample_filename_1.pdf;\\ln2-pv-nas1\AX_CD\MILVCMH-78_90\Data\Files\Board meetings\2007 Meetings\sample_filename_2.xlsx;\\ln2-pv-nas1\AX_CD\MILVCMH-78_90\Users\jeff.smith\Documents\OJSC\meetings\2007\sample_filename_3.doc

DESIRED OUTPUT:
sample_filename_1.pdf; sample_filename_2.xlsx; sample_filename_3.doc

REGEX PATTERN (very possibly incorrect):
[^\\]*$


Keep in mind that I want to pick up ALL matching patterns in my string, and not just one. I've experimented with both the VBA code ".Global = True" as well as the RegEx "/g" operator to no avail.

Any thoughts would be most appreciated!
 
Thanks everyone for your responses!

I think the best approach for matching a pattern is to identify filenames as follows:
1. appearing after the last backslash in a string, and
2. the actual filename in the format of "name.ext" (i.e a filename--which *could* include spaces--followed by a period and then a three-digit file extension).

Thanks!
 
Upvote 0

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Thanks everyone for your responses!

I think the best approach for matching a pattern is to identify filenames as follows:
1. appearing after the last backslash in a string, and
2. the actual filename in the format of "name.ext" (i.e a filename--which *could* include spaces--followed by a period and then a three-digit file extension).

Thanks!

Does the UDF with regex in post #10 work for you?
 
Upvote 0
István,

Your code does work, though not in all scenarios.

Here's a sample string:
\2007 Meetings\Oct 12\Filename1.doc more text; Meetings\Oct 12\File name2.xls etc

In this sample, it does not extract "Filename1.doc" and it does extract "File name2.xls etc". That's why I say matching a pattern of text appearing after the final backslash followed by a preiod and then a file extension may be the best approach.

In the meantime, this is super helpful as is.

Thanks!
 
Upvote 0
Hi lneidorf

What is still missing is what I asked in post #7.

How does the code know that the filename ended?

We know that it starts with a backshash. But

1 - do you know for sure that there are no other dots in the fileanme other than the one before the extension.
This is an usual example of a link:

....\name.domain.com.part1.rar

does it happen with your files?


We need a clear definition, like for ex.:

1 - start with "\"
2 - followed by characters not "\" and not "."
2 - followed by 1 "." character
3 - followed by 3 letters
4 - followed by a non-letter character or the end of text

If you give us a clear and rigorous definition for the filename like this one, I'm sure we'll have no problem to give you a solution.
 
Upvote 0
István,

Your code does work, though not in all scenarios.

Here's a sample string:
\2007 Meetings\Oct 12\Filename1.doc more text; Meetings\Oct 12\File name2.xls etc
The code I posted in Message #5 won't work with that sample string either. Of course, that sample string does not meet what you told us you have in your original message. Back then you said...

"I've got a weekly work task involving the extraction of filenames from
10,000+ unwieldy and inconsistently-formatted network filepaths."

Network file paths start with a double backslash (\\) which is followed by the name of the network's server name which is followed by a single backslash which, in turn, if followed by the remainder of the path. There are two "file paths" in your sample... the first one is invalid on a Windows computer as it is not preceded by a drive letter and colon nor the a server name (which in turn would be preceded by a double backslash)... the second one is valid, I think, but if so, it refers to the local hard drive. If you, in fact, have such paths in your lists, then I don't think any code can be written that would work 100% of the time... you said there was not always a semi-colon after the filename meaning there would have to be a space separating one file path from the other when the semi-colon was missing, but since directories and file names can contain spaces and dots in them, there would be no reliable way to tell what part of the text strings are independent file paths and what part are sub-directory's to a preceding file path. The only way it would be possible to write a program for you is if you can guarantee that the only dots in your text would be before the file extension and no where else. Can you make that guarantee to us?


Edit Note: PGC's message (#14) was not visible yet when I started writing the above message... his message pretty much says the same thing as mine, but in different words. I'm leaving my message in order to reinforce the problem you have now presented us with.
 
Last edited:
Upvote 0
This does what you have suggested in post #11. However, your suggestion fails to handle \jeff.smith\.

Code:
Function GetFName(s As String) As String
    s = Replace(s, "\", "ß")
    With CreateObject("VBScript.Regexp")
        .Global = True
        .Pattern = "ß([^ß]+?[.].{3})|.+?"
        If .test(s) Then GetFName = Application.Trim(.Replace(s, "$1 "))
        End With
End Function
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,875
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top