Trimming Folder Path of a Url Without Losing First Folder In Google Sheets

jahwise

New Member
Joined
May 1, 2003
Messages
39
Hey guys, I have a major deadline I am working on, and really need your advice on a formula.

In Google Sheets: I would like to trim a list of urls to first folder

for example:
url.com/folder1/xxx/index.html

url.com/folder2/

url.com/folder3/xxx/yyyy/index.html

url.com/folder4/zzz/aaa/bbb/index.html

Output should look like this:

url.com/folder1/

url.com/folder2/

url.com/folder3/

url.com/folder4/

I need to find a formula I can use in Google Sheets, which easily takes care of this all at once for at least 10,000 urls?

I found this tool regex101: build, test, and debug regex However, it does not work when you enter more than 1000 urls.

After I posted the above question, I got the following reply, which was partially correct:


The answer to above question is:

=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))

But then there is another problem:

is there a way to automatically remove anything that appears in front of the domain (url.com)? Such as the case when the url is " es-la.url.com/peter.75054 " In the above example, do you see how there is a prefix to the domain? "es-la" In my data set, I need to go through and remove all of those manually, and it takes forever.

For example, if I wanted to remove everything in front of the url.com
url.com/folder1/xxx/index.html

https://www.url.com/folder2/

https://url.com/folder3/xxx/yyyy/index.html

https://es.url.com/folder4/zzz/aaa/bbb/index.html

Output should look like this:

url.com/folder1/

url.com/folder2/

url.com/folder3/

url.com/folder4/

When the case "the case when the url is "es-la.url.com/peter.75054" "
With this formula everything is correct

=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"\w+.\w+/\w+")))


YES, GREAT! HOWEVER< THERE IS ONE LAST PROBLEM:

What if the folder has a period in the folder name...such as the case of: "the case when the url is "es-la.url.com/peter.75054"

Do you see how there is a period in between peter "." 75054 ?

This throws the formula off. The output of the formula is not correct, it defers the output to an incorrect folder name like this:

For example, if I wanted to remove everything in front of the url.com

https://es-la.url.com/peter.75054

Becomes:

es-la.url.com/peter

Above url would not be correct.

It would need to be:

es-la.url.com/peter.75054


Do you know how to fix this?

I need someone who understand how to make that final fix, because the person who helped me this far is not available anymore.


This did not work
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))
 

Excel Facts

Round to nearest half hour?
Use =MROUND(A2,"0:30") to round to nearest half hour. Use =CEILING(A2,"0:30") to round to next half hour.
I have moved this thread for you but for the future, please note the forum descriptions.

1616466811052.png

1616466838314.png
 
Upvote 0

Forum statistics

Threads
1,223,573
Messages
6,173,131
Members
452,501
Latest member
musallam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top