prestoncole
New Member
- Joined
- Jul 24, 2023
- Messages
- 2
- Office Version
- 365
- Platform
- Windows
Hi all
I don't know how to go about accomplishing this.
I have some Internet browsing history in a text file. It has 500,000 lines.
I need to extract the start of each URL (e.g. https://www.bbc.o.uk/ from https://www.bbc.co.uk/thispage/thatpage/index.htm ) and log the number of times that URL is listed in teh document.
An example of some of the lines are:
==================================================
URL : Sign in
Title :
Visit Time : 06/02/2023 09:14:40
Visit Count : 2
Visited From :
Visit Type :
Visit Duration :
Web Browser : Internet Explorer 10/11 / Edge
User Profile : USERNAME
Browser Profile :
URL Length : 106
Typed Count :
History File : C:\Users\USERNAME\AppData\Local\Microsoft\Windows\WebCache\WebCacheV01.dat
Record ID : 1
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.369
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7220
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:06.542
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7221
==================================================
==================================================
URL : Hambro Perks founder Dominic Perks makes a sudden exit.
Title : Hambro Perks founder Dominic Perks makes a sudden exit
Visit Time : 18/04/2023 10:41:46
Visit Count : 1
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:13.729
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 183
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7222
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.131
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7223
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:12.704
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7224
==================================================
==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:13
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Form Submit
Visit Duration : 00:00:00.505
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7225
==================================================
==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:14
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:03.540
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7226
==================================================
==================================================
URL : Hambro Perks
Title : Hambro Perks
Visit Time : 18/04/2023 10:42:17
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:14.856
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 27
Typed Count : 0
History File : C:\Users\IUSERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7227
==================================================
==================================================
URL : News | Hambro Perks
Title : News | Hambro Perks
Visit Time : 18/04/2023 10:42:32
Visit Count : 2
Visited From : Hambro Perks
Visit Type : Link
Visit Duration : 00:00:24.551
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 31
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7228
==================================================
==================================================
URL : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Title : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Visit Time : 18/04/2023 10:42:56
Visit Count : 1
Visited From : News | Hambro Perks
Visit Type : Link
Visit Duration : 00:00:14.124
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 94
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7229
==================================================
Each line of text is one cell, it is NOT comma delimited etc.
The main lien is:
URL :Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
needs to be found as www.hambroperks.com
Make sense?
I hope so.
Any idea how I would start this?
Cheers
Preston
I don't know how to go about accomplishing this.
I have some Internet browsing history in a text file. It has 500,000 lines.
I need to extract the start of each URL (e.g. https://www.bbc.o.uk/ from https://www.bbc.co.uk/thispage/thatpage/index.htm ) and log the number of times that URL is listed in teh document.
An example of some of the lines are:
==================================================
URL : Sign in
Title :
Visit Time : 06/02/2023 09:14:40
Visit Count : 2
Visited From :
Visit Type :
Visit Duration :
Web Browser : Internet Explorer 10/11 / Edge
User Profile : USERNAME
Browser Profile :
URL Length : 106
Typed Count :
History File : C:\Users\USERNAME\AppData\Local\Microsoft\Windows\WebCache\WebCacheV01.dat
Record ID : 1
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.369
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7220
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:41:40
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:06.542
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 170
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7221
==================================================
==================================================
URL : Hambro Perks founder Dominic Perks makes a sudden exit.
Title : Hambro Perks founder Dominic Perks makes a sudden exit
Visit Time : 18/04/2023 10:41:46
Visit Count : 1
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:13.729
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 183
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7222
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From :
Visit Type : Generated
Visit Duration : 00:00:00.131
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7223
==================================================
==================================================
URL : dominic perks - Google Search
Title : dominic perks - Google Search
Visit Time : 18/04/2023 10:42:00
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Link
Visit Duration : 00:00:12.704
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 176
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7224
==================================================
==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:13
Visit Count : 2
Visited From : dominic perks - Google Search
Visit Type : Form Submit
Visit Duration : 00:00:00.505
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7225
==================================================
==================================================
URL : hambro perks - Google Search
Title : hambro perks - Google Search
Visit Time : 18/04/2023 10:42:14
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:03.540
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 1092
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7226
==================================================
==================================================
URL : Hambro Perks
Title : Hambro Perks
Visit Time : 18/04/2023 10:42:17
Visit Count : 2
Visited From : hambro perks - Google Search
Visit Type : Link
Visit Duration : 00:00:14.856
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 27
Typed Count : 0
History File : C:\Users\IUSERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7227
==================================================
==================================================
URL : News | Hambro Perks
Title : News | Hambro Perks
Visit Time : 18/04/2023 10:42:32
Visit Count : 2
Visited From : Hambro Perks
Visit Type : Link
Visit Duration : 00:00:24.551
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 31
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7228
==================================================
==================================================
URL : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Title : Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
Visit Time : 18/04/2023 10:42:56
Visit Count : 1
Visited From : News | Hambro Perks
Visit Type : Link
Visit Duration : 00:00:14.124
Web Browser : Chrome
User Profile : USERNAME
Browser Profile : Default
URL Length : 94
Typed Count : 0
History File : C:\Users\USERNAME\AppData\Local\Google\Chrome\User Data\Default\History
Record ID : 7229
==================================================
Each line of text is one cell, it is NOT comma delimited etc.
The main lien is:
URL :Starting the tampon revolution with Valentina Milanova | News | Hambro Perks
needs to be found as www.hambroperks.com
Make sense?
I hope so.
Any idea how I would start this?
Cheers
Preston