Hey guys, I have a major deadline I am working on, and really need your advice on a formula.
In Google Sheets: I would like to trim a list of urls to first folder
for example:
url.com/folder1/xxx/index.html
url.com/folder2/
url.com/folder3/xxx/yyyy/index.html
url.com/folder4/zzz/aaa/bbb/index.html
Output should look like this:
url.com/folder1/
url.com/folder2/
url.com/folder3/
url.com/folder4/
I need to find a formula I can use in Google Sheets, which easily takes care of this all at once for at least 10,000 urls?
I found this tool regex101: build, test, and debug regex However, it does not work when you enter more than 1000 urls.
After I posted the above question, I got the following reply, which was partially correct:
The answer to above question is:
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))
But then there is another problem:
is there a way to automatically remove anything that appears in front of the domain (url.com)? Such as the case when the url is " es-la.url.com/peter.75054 " In the above example, do you see how there is a prefix to the domain? "es-la" In my data set, I need to go through and remove all of those manually, and it takes forever.
For example, if I wanted to remove everything in front of the url.com
url.com/folder1/xxx/index.html
https://www.url.com/folder2/
https://url.com/folder3/xxx/yyyy/index.html
https://es.url.com/folder4/zzz/aaa/bbb/index.html
Output should look like this:
url.com/folder1/
url.com/folder2/
url.com/folder3/
url.com/folder4/
When the case "the case when the url is "es-la.url.com/peter.75054" "
With this formula everything is correct
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"\w+.\w+/\w+")))
YES, GREAT! HOWEVER< THERE IS ONE LAST PROBLEM:
What if the folder has a period in the folder name...such as the case of: "the case when the url is "es-la.url.com/peter.75054"
Do you see how there is a period in between peter "." 75054 ?
This throws the formula off. The output of the formula is not correct, it defers the output to an incorrect folder name like this:
For example, if I wanted to remove everything in front of the url.com
https://es-la.url.com/peter.75054
Becomes:
es-la.url.com/peter
Above url would not be correct.
It would need to be:
es-la.url.com/peter.75054
Do you know how to fix this?
I need someone who understand how to make that final fix, because the person who helped me this far is not available anymore.
This did not work
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))
In Google Sheets: I would like to trim a list of urls to first folder
for example:
url.com/folder1/xxx/index.html
url.com/folder2/
url.com/folder3/xxx/yyyy/index.html
url.com/folder4/zzz/aaa/bbb/index.html
Output should look like this:
url.com/folder1/
url.com/folder2/
url.com/folder3/
url.com/folder4/
I need to find a formula I can use in Google Sheets, which easily takes care of this all at once for at least 10,000 urls?
I found this tool regex101: build, test, and debug regex However, it does not work when you enter more than 1000 urls.
After I posted the above question, I got the following reply, which was partially correct:
The answer to above question is:
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))
But then there is another problem:
is there a way to automatically remove anything that appears in front of the domain (url.com)? Such as the case when the url is " es-la.url.com/peter.75054 " In the above example, do you see how there is a prefix to the domain? "es-la" In my data set, I need to go through and remove all of those manually, and it takes forever.
For example, if I wanted to remove everything in front of the url.com
url.com/folder1/xxx/index.html
https://www.url.com/folder2/
https://url.com/folder3/xxx/yyyy/index.html
https://es.url.com/folder4/zzz/aaa/bbb/index.html
Output should look like this:
url.com/folder1/
url.com/folder2/
url.com/folder3/
url.com/folder4/
When the case "the case when the url is "es-la.url.com/peter.75054" "
With this formula everything is correct
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"\w+.\w+/\w+")))
YES, GREAT! HOWEVER< THERE IS ONE LAST PROBLEM:
What if the folder has a period in the folder name...such as the case of: "the case when the url is "es-la.url.com/peter.75054"
Do you see how there is a period in between peter "." 75054 ?
This throws the formula off. The output of the formula is not correct, it defers the output to an incorrect folder name like this:
For example, if I wanted to remove everything in front of the url.com
https://es-la.url.com/peter.75054
Becomes:
es-la.url.com/peter
Above url would not be correct.
It would need to be:
es-la.url.com/peter.75054
Do you know how to fix this?
I need someone who understand how to make that final fix, because the person who helped me this far is not available anymore.
This did not work
=ArrayFormula(IFERROR(REGEXEXTRACT(A1:A,"^([^\/]*\/[^\/]*)")))