Extract text after certain characters and use in a graph

prestoncole

New Member
Joined
Jul 24, 2023
Messages
2
Office Version
  1. 365
Platform
  1. Windows
Hi all

I don't know how to go about accomplishing this.
I have some Internet browsing history in a text file. It has 500,000 lines.
I need to extract the start of each URL (e.g. https://www.bbc.o.uk/ from https://www.bbc.co.uk/thispage/thatpage/index.htm ) and log the number of times that URL is listed in teh document.
An example of some of the lines are:

==================================================
URL : Sign in
Title :
Visit Time : 06/02/2023 09:14:40
Visit Count : 2
Visited From :
Visit Type :
Visit Duration :
Web Browser : Internet Explorer 10/11 / Edge
User Profile : USERNAME
Browser Profile :
URL Length : 106
Typed Count :
History File : C:\Users\USERNAME\AppData\Local\Microsoft\Windows\WebCache\WebCacheV01.dat
Record ID : 1
==================================================

==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.369
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7220
==================================================

==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:06.542
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7221
==================================================

==================================================
URL : Hambro Perks founder Dominic Perks makes a sudden exit.
Title : Hambro Perks founder Dominic Perks makes a sudden exit
Visit Time : 18/04/2023 10:41:46
Visit Count : 1
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:13.729
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 183
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7222
==================================================

==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.131
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7223
==================================================

==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:12.704
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7224
==================================================

==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:13
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Form Submit
Visit Duration : 00:00:00.505
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7225
==================================================

==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:14
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:03.540
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7226
==================================================

==================================================
URL : Hambro Perks
Title : Hambro Perks
Visit Time : 18/04/2023 10:42:17
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:14.856
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 27
Typed Count : 0
History File : C:\Users\IUSERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7227
==================================================

==================================================
URL : News | Hambro Perks
Title : News | Hambro Perks
Visit Time : 18/04/2023 10:42:32
Visit Count : 2
Visited From : Hambro Perks
Visit Type : Link
Visit Duration : 00:00:24.551
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 31
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7228
==================================================

==================================================
URL : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Title : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Visit Time : 18/04/2023 10:42:56
Visit Count : 1
Visited From : News | Hambro Perks
Visit Type : Link
Visit Duration : 00:00:14.124
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 94
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7229
==================================================


Each line of text is one cell, it is NOT comma delimited etc.
The main lien is:
URL :Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
needs to be found as www.hambroperks.com

Make sense?

I hope so.

Any idea how I would start this?


Cheers


Preston
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
Looks like my pasting of the code has simplified the URL's.
This is what it looks like on my screen:
1690210777438.png
 
Upvote 0
Perhaps get the text file into power query, split the column by colons and forward slashes, then filter by URL, and that should give you a list of domain names in one of the columns?
 
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top