Extracting the Root domain

Atri R

New Member
Joined
Mar 26, 2018
Messages
11
Hello Folks,

I have a requirement to extract the domains from a list of URLs. I know this is quite trending and a petty common topic across generic internet search, however no formulas quite seem to match my specific query.

My file has a list of domains (which include subdomains and multiple TLDs). I only need to extract the parent domain and strip off any subdomain or TLD(s) from the list.

Example:

[TABLE="width: 500"]
<tbody>[TR]
[TD]Domain[/TD]
[TD]Extracted[/TD]
[/TR]
[TR]
[TD]www.sub.domain.com[/TD]
[TD]domain[/TD]
[/TR]
[TR]
[TD]https://sub.domain2.com[/TD]
[TD]domain2[/TD]
[/TR]
[TR]
[TD]domain3.com.ru[/TD]
[TD]domain3[/TD]
[/TR]
[TR]
[TD]www.sub2.domain4.com.fl[/TD]
[TD]domain4[/TD]
[/TR]
[TR]
[TD]sub1.sub2.domain5.org.fl[/TD]
[TD]domain5[/TD]
[/TR]
</tbody>[/TABLE]

Note: there are multiple TLDs and multiple sub-domains. I know its quite troublesome, but would be great if someone can help.

Thanks
 

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Welcome to the MrExcel board!

Would this trap at least the majority of them?

Excel Workbook
AB
1DomainExtracted
2www.sub.domain.comdomain
3https://sub.domain2.comdomain2
4domain3.com.rudomain3
5www.sub2.domain4.com.fldomain4
6sub1.sub2.domain5.org.fldomain5
Sheet1
 
Upvote 0
[TABLE="width: 668"]
<colgroup><col span="5"><col></colgroup><tbody>[TR]
[TD]A[/TD]
[TD]B[/TD]
[TD]C[/TD]
[TD]D[/TD]
[TD]E[/TD]
[TD]F[/TD]
[/TR]
[TR]
[TD="align: right"]4[/TD]
[TD="colspan: 3"]www.sub.domain.com[/TD]
[TD]domain[/TD]
[TD] =MID(B4,FIND("domain",B4),LEN("domain "))[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Welcome to the MrExcel board!

Would this trap at least the majority of them?

Sheet1

AB
DomainExtracted
www.sub.domain.comdomain
https://sub.domain2.comdomain2
domain3.com.rudomain3
www.sub2.domain4.com.fldomain4
sub1.sub2.domain5.org.fldomain5

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:188px;"><col style="width:88px;"></colgroup><tbody>
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]1[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]2[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]3[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]4[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]5[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]6[/TD]

</tbody>

Spreadsheet Formulas
CellFormula
B2=TRIM(RIGHT(SUBSTITUTE(REPLACE(A2,MIN(SEARCH({".com.",".org.",".net.",".int.",".edu.",".gov.",".mil."},A2&".com.org.net.int.edu.gov.mil.")),99,""),".",REPT(" ",100)),100))

<tbody>
</tbody>

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4

Thanks Peter. This works perfectly. So if I come across any new TLDs which are not covered in the formula, can I just add them in the SEARCH tags?
 
Upvote 0
Thanks Peter. This works perfectly. So if I come across any new TLDs which are not covered in the formula, can I just add them in the SEARCH tags?
You would need to add each one twice. For example

=TRIM(RIGHT(SUBSTITUTE(REPLACE(A2,MIN(SEARCH({".com.",".org.",".net.",".int.",".edu.",".gov.",".mil.",".abc."},A2&".com.org.net.int.edu.gov.mil.abc.")),99,""),".",REPT(" ",100)),100))
 
Last edited:
Upvote 0
I figured that out. Thanks a ton!

One last query, I was checking my file and there seems to be a LOT of TLDs which are not standard. So I have to go through all 60k domains manually and add the ones which are not in the Formula.

Can you help me (if possible) with a VBA script which will hold a repository of all the domains and do the similar function that the formula is doing? Or if you can suggest any alternative way which will simplify the manual check?
 
Upvote 0
Would this user-define function be better? To implement ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below and copy down.
6. Your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Rich (BB code):
Function Domain(s As String) As String
  Dim Bits As Variant
  Dim i As Long
  
  'Add more TLDs below as required
  Const TLDs As String = ".com.org.net.int.edu.gov.mil." & _
                          "tld8.tld9.tld10.tld11.tld12.tld13.tld14.tld15.tld16.tld17." & _
                          "tld18.tld19.tld20.tld21.tld22.tld23.tld24."
                          
  Domain = "Not found"
  Bits = Split(s, ".")
  For i = UBound(Bits) To 1 Step -1
    If InStr(1, TLDs, "." & Bits(i) & ".", vbTextCompare) > 0 Then
      Domain = Bits(i - 1)
      Exit Function
    End If
  Next i
End Function

Excel Workbook
AB
1DomainExtracted
2www.sub.domain.comdomain
3https://sub.domain2.comdomain2
4domain3.com.rudomain3
5www.sub2.domain4.com.fldomain4
6sub1.sub2.domain5.org.fldomain5
7sub1.sub2.somedomain.tld2000.flNot found
8sub1.sub2.somedomain.tld20.flsomedomain
Sheet1
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,896
Messages
6,175,263
Members
452,627
Latest member
KitkatToby

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top