Help with using "find" or something else Arrays?

ShogunStealth

New Member
Joined
Nov 6, 2021
Messages
23
Office Version
  1. 2019
Platform
  1. Windows
I have a large spreadsheet with more that 400K rows of data on Sheet(1) with several columns but one column H contains URLs. I can copy the URLs to a another sheet in the same book (Sheet(2) = "UniqueURL") in column A and only display unique URL in alphabetical order (with heading). I then expect the user to view this sheet and list of URL and place a value in column B like "Keep" next to the URLs they wish to keep. I cannot think of a better ways of capturing the user selections for a list that can be more than 300 options. The selections may be one or several websites. I am struggling to write some vba code that will allow me to then delete all and entire row of data and remove the blanks on Sheet(1) that don't match the "keep" URLs. So that only the rows of data selected remains without any gaps based on the selection on Sheet(2). The matching should be on whole cell not partial. For example;

Sheet(2)
http://fruit.com.au/SummerFruit/ Keep
http://fruit.com.au/SummerFruit/Citrus/Madarin/ Keep
http://fruit.com.au/SummerFruit/Mangoes/
http://fruit.com.au/SummerFruit/Watermelon/ Keep

Selecting http://Fruit.com.au/SummerFruit/ does not automatically include subsites not selected like http://Fruit.com.au/SummerFruit/Citrus/ unless selected (like http://Fruit.com.au/SummerFruit/Citrus/Madarin/ ) or http://Fruit.com.au/SummerFruit/Mangoes/

I have never used arrarys before so I'm a bit lost but vaguely understand indexing a variable with multiple values. I would prefer to perhaps use a "find" somehow. Also, because there are so many rows I need some efficient code that runs quickly. Appreciate any help our thoughts.
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
If it's just a one of task, personally I would use a simple countifs formula in sheet1 to check if the url has keep next to it in sheet2, then filter the formula results to hide those which should be kept and delete the rows that are still visible.
 
Upvote 0
I have perceived with my attempts and have come up with 3 pieces of code two to delete the rows that I wish to discard based on a cell containing no content "" and another to compare the selected URLs with the ones in the master raw data sheet. There are ugly as they take forever to run can anyone help with more efficient piece of code?
As an example when I select 14 URLs it took more than 60 minutes to Compare Sub to complete.

I completed 2 versions (RemoveBlankRows2 and RemoveBlankRows3) to remove rows that are blank in a particular column that is "" or empty. #2 uses a for loop to go through each row then deleting and the #3 autofilter to filter on blank cells. I suspect the #3 is faster see code below am I right?

Noting I have bigger spreadsheets, even the slightest time saved would help.

This is the Compare code;

VBA Code:
Sub Compare()

Dim LastRowSheet1 As Long
Dim LastRowSheet2 As Long
Dim i As Long
Dim j As Long

    Worksheets(1).Select
    Range("A1").Select

    Columns("H:H").Select
    Selection.Insert Shift:=xlToRight

    LastRowSheet1 = Cells(Rows.Count, 1).End(xlUp).Row ' This value can be well over 400,000
    Debug.Print LastRowSheet1
    LastRowSheet2 = Worksheets(2).Cells(Rows.Count, 1).End(xlUp).Row ' This value can be upto 1,000
    Debug.Print LastRowSheet2
  
    For i = 2 To LastRowSheet1
        For j = 2 To LastRowSheet2
            If Cells(i, 7).Value = Worksheets(2).Cells(j, 1).Value Then
                Cells(i, 8).Value = "Keep"
            End If
        Next j
    Next i
  
End Sub

This is the RemoveBlankRows code;

VBA Code:
Sub RemoveBlankRows2()

Dim LastRow As Long
Dim i As Long

    Worksheets(1).Select
    Range("A1").Select

    LastRow = Cells(Rows.Count, 1).End(xlUp).Row

    For i = LastRow To 2 Step by - 1
        If Cells(i, 8).Value = "" Then
            Rows(i).Delete
        End If
    Next i
  
    Worksheets(1).Select
    Range("A1").Select

End Sub

Sub RemoveBlankRows3()


    Range("H1").Select
    Selection.AutoFilter
    ActiveSheet.Range("$A$1:$L$" & Cells(Rows.Count, 1).End(xlUp).Row).AutoFilter Field:=8, Criteria1:="="
    Range("A2").Select
    Range(Selection, Selection.End(xlDown)).Select
    Selection.EntireRow.Delete
    ActiveSheet.Range("$A$1:$L$" & Cells(Rows.Count, 1).End(xlUp).Row).AutoFilter Field:=8
    Range("H1").Select
    Selection.AutoFilter

End Sub
 
Upvote 0
If you are still interested here is a version of your compare macro that uses the dictionary.
VBA Code:
Sub Compare()

    Dim LastRowSheet1 As Long
    Dim LastRowSheet2 As Long
    Dim i As Long
    Dim j As Long
    Dim wsMain As Worksheet
    Dim wsURL As Worksheet
    Dim arrMain As Variant
    Dim arrURL As Variant   
    Dim dictURL As Object

    Set wsMain = Worksheets(1)
    Set wsURL = Worksheets(2)

    wsMain.Columns("H:H").Insert Shift:=xlToRight

    LastRowSheet1 = wsMain.Cells(Rows.Count, 1).End(xlUp).Row  ' This value can be well over 400,000
    LastRowSheet2 = wsURL.Cells(Rows.Count, 1).End(xlUp).Row ' This value can be upto 1,000

    With wsMain
        arrMain = .Range(.Cells(2, 7), .Cells(LastRowSheet1, 8)).Value
    End With
   
    With wsURL
        arrURL = .Range(.Cells(2, 1), .Cells(LastRowSheet2, 1)).Value
    End With
   
    ' Load URLs into dictionary
    Set dictURL = CreateObject("scripting.dictionary")
    For i = 1 To UBound(arrURL)
        dictURL(arrURL(i, 1)) = i
    Next i
   
    ' Flag if value exists in dict
    For j = 1 To UBound(arrMain)
        If dictURL.exists(arrMain(j, 1)) Then arrMain(j, 2) = "Keep"
    Next j
   
    ' Write out results to sheet
    With wsMain
        .Range(.Cells(2, 8), .Cells(LastRowSheet1, 8)).Value = Application.Index(arrMain, 0, 2)
    End With

End Sub
 
Upvote 0
Thanks Alex very much, this worked a treat, I'm just doing some further testing with large data sets later today but i expect no issues.

Whilst I would of never been able to achieve this code but I am keen to learn to understand it. Am I right in saying an 'object' in this context is a virtual item that only lasts in the virtual memory while the macro is running? So the section "Load URLs into dictionary";
  • reads in each row of the range defined wsURL or specifically arrURL (ie row 2 column 1 until last row column 1)
  • reads it into an indexed array held in virtual memory = ??? hence object
  • so that when the checking occurs in the next for loop "Flag if value exists in dict" this is piece of code it is comparing the key URLs also in memory rather than cycling through a list on another page for each row in main (in my example)?
  • Am I also correct that the matches are also initially committed to virtual memory before they written to the spreadsheet by
    • .Range(.Cells(2, 8), .Cells(LastRowSheet1, 8)).Value = Application.Index(arrMain, 0, 2) ?
  • Do both of these statements ;
    • arrMain = .Range(.Cells(2, 7), .Cells(LastRowSheet1, 8)).Value and
    • arrURL = .Range(.Cells(2, 1), .Cells(LastRowSheet2, 1)).Value - read in each respective (8 and 1) column starting at row 2 on both worksheets?
  • lastly the do the dots "."without a declared variable represent relative or implied links, ie so instead of Worksheet(2).Range("A1") you may write .Range("A1")?
    • I presume you need to have your wits about you to remember which sheet you have focus on, hence the statements "With wsMain and With wsURL"
Thanks again :)
 
Upvote 0
Responding to your question:-

Whilst I would of never been able to achieve this code but I am keen to learn to understand it. Am I right in saying an 'object' in this context is a virtual item that only lasts in the virtual memory while the macro is running? So the section "Load URLs into dictionary";
Yes
  • reads in each row of the range defined wsURL or specifically arrURL (ie row 2 column 1 until last row column 1)
    • Correct reads the range into memory into the array arrURL
  • reads it into an indexed array held in virtual memory = ??? hence object
    • Yes
  • so that when the checking occurs in the next for loop "Flag if value exists in dict" this is piece of code it is comparing the key URLs also in memory rather than cycling through a list on another page for each row in main (in my example)?
    • Yes in your example you are having to cycle through the whole range of URLs each time. Cycling through an array in memory is faster and using a dictionary that has an index so you don’t need to cycle through each time is much faster.
  • Am I also correct that the matches are also initially committed to virtual memory before they written to the spreadsheet by
    • .Range(.Cells(2, 8), .Cells(LastRowSheet1, 8)).Value = Application.Index(arrMain, 0, 2) ?
      • Yes – faster to only write to the worksheet once by updating in memory (array) first.
  • Do both of these statements ;
    • arrMain = .Range(.Cells(2, 7), .Cells(LastRowSheet1, 8)).Value and
    • arrURL = .Range(.Cells(2, 1), .Cells(LastRowSheet2, 1)).Value - read in each respective (8 and 1) column starting at row 2 on both worksheets?
      • arrMain is column 7 & 8
      • arrURL is column 1
  • lastly the do the dots "."without a declared variable represent relative or implied links, ie so instead of Worksheet(2).Range("A1") you may write .Range("A1")?
    • I presume you need to have your wits about you to remember which sheet you have focus on, hence the statements "With wsMain and With wsURL"
      • It is much more efficient to avoid using Select & Activate in your code and access the worksheet/range object directly
      • You don’t have to trace back from a Range command to work out which sheet was active. It is fully specified.
      • Where you are using multiple commands that are a subset of another object, it is both shorter in syntax and more efficient to use a With .. End With to specify the repeating object and to just a full stop “.” to signify that you are referring to a property or method of the specified object.
 
Upvote 0
Hi Alex,

Thank you for the explanations throughout.

I have come up with an error in a test scenario, where I only select one URL - Run-time error 13 Type mismatch pressing debug shows this line

For i = 1 To UBound(arrURL)

I suspect Ubound(arrURL) is = 1

so it is trying to for i = 1 to 1 perhaps?

A workaround was to add a end of list marker to Worksheet(2) - needed to place 2 lines to also capture if user selection is nil

VBA Code:
[/I]
    ' Add an end of URL marker
    Range("A" & Cells(Rows.Count, "a").End(xlUp).Row + 1).Value = "ZZZ End of list Marker - Do not delete"
    Range("B" & Cells(Rows.Count, "a").End(xlUp).Row).Value = "Keep"
    Range("A" & Cells(Rows.Count, "a").End(xlUp).Row + 1).Value = "ZZZ End of list Marker - Do not delete"
    Range("B" & Cells(Rows.Count, "a").End(xlUp).Row).Value = "Keep"
[I]
 
Upvote 0
Glad to hear you are giving it a good testing.
A single value will cause the array to be a string instead of actual array.

Change this:
VBA Code:
        For i = 1 To UBound(arrURL)
            dictURL(arrURL(i, 1)) = i
        Next i

To this:
VBA Code:
    If IsArray(arrURL) Then
        For i = 1 To UBound(arrURL)
            dictURL(arrURL(i, 1)) = i
        Next i
    Else
        dictURL(arrURL) = 1
    End If
 
Upvote 0

Forum statistics

Threads
1,225,138
Messages
6,183,089
Members
453,147
Latest member
Bree2019

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top