Which is better: Array or Collection of Objects?

Amateurhr

Active Member
Joined
Dec 26, 2011
Messages
343
I have long used one-dimensional Arrays to store objects (dictionaries, etc). Recently I've been using collections to store objects instead as the code seems a little bit shorter and I don't have to constantly redim and iterate by typing:

Code:
For i = lbound(arr) to ubound(arr).

I far prefer to say:

Code:
For each entry in coll
or
For i = 1 to coll.count
Any reasons why collections are NOT as good as arrays for storing objects? Or are they generally speaking far better when you're not doing computations such as Application.match(var, arr, 0), which require Arrays/Ranges?
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
Hi Amateurhr,

Like so many things in Excel, the answer to the "which is better?" question is: it depends :)

Here's a few pros and cons. I'm sure others could add more.

Cases in which Arrays are typically better than Collections:
1. Easier and faster when transferring data to and from Worksheet Ranges
2. When applying worksheet functions that take Arrays as parameters (rephrasing your example).
3. Slightly faster read-write of values to and from Arrays than Collections if the size of Array is fixed.

Cases in which Collections are typically better than Arrays:
1. When there's a benefit to referencing items by Key strings instead of index numbers
2. Storing only unique items
3. Faster to .Add a variable number of items to Collection than similar process using Redim Preserve for each added item with an Array
 
Upvote 0
Cases in which Collections are typically better than Arrays:
3. Faster to .Add a variable number of items to Collection than similar process using Redim Preserve for each added item with an Array

Interesting you should mention that. I recently posted the following (modified slightly) to this recent "Get Digital Help" blog article...

How slow is ReDim Preserve? | Get Digital Help - Microsoft Excel resource

The way we used to minimize the amount of times ReDim Preserve needed to be called within a loop (way back in my compiled VB days) was to do the redimension in "chunks". That is, we guessed at the maximum amount of elements we though the array would have to handle and, if we were wrong, only then would we perform the ReDim Preserve increasing the array in size by the chuck amount. The nice thing about this method is that if we had no idea how many elements would finally be needed, it still reduced considerably the number of ReDim Preserves that would need to be performed. At the end of the procedure, we would perform one more ReDim Preserve in order to size the array to the actual number of elements it ended up containing. The following is my recollection of the code "structure" we used to use to do this (it has been a while since I needed to do this, and the untested pseudo-code below was made up off the top-of-my-head, but I am pretty sure it is correct)...

Code:
Dim Counter As Long, SomeArray() As String
Const ChunkSize As Long = 1000
....
....
ReDim SomeArray(1 To ChunkSize)
Do While Whatever
  If ConditionIsMet Then
    Counter = Counter + 1
    If Counter > UBound(SomeArray) Then
      ReDim Preserve SomeArray(1 To UBound(SomeArray) + ChunkSize)
    End If
    '
    '  Code to figure out SomeStringValue that is to be assigned to the array
    '
    SomeArray(Counter) = SomeStringValue
  End If
Loop
ReDim Preserve SomeArray(1 To Counter)
....
....
Obviously, the code to calculate what is to be assigned to the array will differ with different setups and needs, but the above is my recollection of a general outline for how to minimize the number of times ReDim Preserve needs to be called in a loop.
 
Last edited:
Upvote 0
Hi Rick, Thanks for sharing that technique. BTW, I really enjoy your "mini" blog.

One other item came to mind for the Pros/Cons...

Cases in which Arrays are typically better than Collections:
4. For sorting a group of items.
 
Upvote 0
Interesting you should mention that. I recently posted the following (modified slightly) to this recent "Get Digital Help" blog article...

How slow is ReDim Preserve? | Get Digital Help - Microsoft Excel resource
Rick, with due respect, the code in the link you quote, if it's aim is to show that Redim is faster than Redim Preserve, is methodological rubbish.

Any procedure that assumes a large part of its conclusion in advance, particularly if no evidence for the validity of the assumption is provided (how long would it take to provide such evidence, even if it exists?) can very, plainly obviously be done much faster than with a procedure that need work its way through the evidence to reach its conclusion.

In the case of Redim, how does the user know in advance that the upper value is indeed (say) 300,000? Is the time to acquire this a priori knowledge taken into account for the comparison? If not, why not? What if the upper limit is actually 300,001? Is the additional time taken to sort out the "subscript out of range" error taken into account? If not, why not?

The only times Redim Preserve needs to be used is when user doesn't know the upper limit in advance. If user does have valid a priori knowledge of the upper limit there's no purpose to use Redim Preserve. Redim and Redim Preserve have different purposes.

So what if it eventuated in fact that the upper limit were in fact 3,000 rather than the assumed 300,000. Would you still claim that Redim(300000) would be the faster? If so, on what basis? Or would you continue Redim Preserving up to 300,000 to claim the point has been proven?

The problem with using your code to draw conclusions, and why such conclusions must logically fail, is that the code doesn't use a common basis for comparison. It's just as easy to show that Redim Preserve is the faster as it is to show that Redim is the faster, merely by arbitrary (eristic?) choice of starting positions.

Different intitial conditions may lead to different experimental outcomes. This was noted years ago (around 1908) by the French physicist and philosopher of science Pierre Duhem. As far as I know, nobody has seriously argued the contary since. The dangers of assuming one's conclusions remain high.
 
Upvote 0
The purpose of the exercise is to demonstrate a method for limiting the use of Redim Preserve. I can easily see situations where the upper bound may be known, but subject to change. Using Redim Preserve in a special loop (as Rick shows) is a way to mitigate the relatively expensive cost of using Redim Preserve.

This isn't a philosophy board. Please show your code that shows how Redim Preserve is faster than Redim, and spare us your annotations on the assumptions. I can assure you the numbers will speak for themselves.
 
Upvote 0
I agree that redimensioning an array in the innermost loop is usually just bad design, and expanding in chunks is good if an array is selected (versus a collection or dictionary) for other reasons.

Cases in which Arrays are typically better than Collections:
4. For sorting a group of items.
Dunno about that, Jerry. You can insertion-sort a collection in N log(N) time.

If the difference in time were nominal (20%, 30%, ?) I'd choose the approach that made the code the most transparent as to intent and operation.
 
Upvote 0
I agree that redimensioning an array in the innermost loop is usually just bad design, and expanding in chunks is good if an array is selected (versus a collection or dictionary) for other reasons.


Dunno about that, Jerry. You can insertion-sort a collection in N log(N) time.

If the difference in time were nominal (20%, 30%, ?) I'd choose the approach that made the code the most transparent as to intent and operation.

Hi shg,

I agree with you that clarity/simplicity/maintainability considerations are typically going to outweigh nominal speed considerations.
In response to the OP's question, my intent was to offer factors in which Array or Collection might be a better fit (all other things being equal).

The basis for my assertion that Arrays are typically better than Collections for sorting a group of items was a bit of a second-hand inference. Since I didn't know of a built-in way to sort Collections, I looked on Chip Pearson's site to see how he handles this.

Sorting A Collection Or Dictionary

In his code library of Collection functions, for his example Sub SortCollection, his approach is to transfer the Collection to an Array, use a sorting algorithm then transfer it back. While that's not conclusive evidence that Array's are better for sorting, it's persuasive.

I'd be interested to see an emperical comparison if you could offer some code showing an insertion-sort of a Collection.
 
Upvote 0
Here are results of some tests intended to compare the relative speed of storing and reading a large group of items in Collections vs Arrays using the methods previously discussed. Further below I've posted the code used so others can repeat or build upon it.

The first test was to Store 12,345,678 items that were the result of a simple multiplication.

On a realative basis, storing items in an array when one knows the number of items is the fastest.
To shg's point, all methods were nominally very fast- this shouldn't be a factor in deciding which handling method.

The test's supported that the method of Redimming in Chunks that Rick suggested is more efficient than Redimming as each item is added.

Comparative Results of Store-Only with 1,234,567 items:
Collection: 0.19 seconds
Array_Known_Size: 0.02 seconds
Array_Unknown_Size_Redim_Each : 1.25 seconds
Array_Unknown_Size_Redim_Chunks : 0.16 seconds

The second test was to Store the items and then Read each one to a variable.
The results of this were surprising. The Collection did very poorly. It locked up my computer and I had to run a follow up test with a scaled down the number of items to test the relative speed.

Comparative Results of Store_then_Read with 1,234,567 items:
Collection: N/A Locked up computer.
Array_Known_Size: 0.08 seconds
Array_Unknown_Size_Redim_Each : 1.25 seconds
Array_Unknown_Size_Redim_Chunks : 0.15 seconds

Results of a testing with a smaller sample size.
Comparative Results of Store_then_Read with 123,456 items:
Collection: 31.03 seconds
Array_Known_Size: 0 seconds
Array_Unknown_Size_Redim_Each : 0.03 seconds
Array_Unknown_Size_Redim_Chunks : 0 seconds

So it would appear a Collection is significantly less efficient than an Array at doing a large number of Reads from a large group of stored items.

Code:
Sub Compare_Times_Store_Only()
    Const lNumberOfItems As Long = 1234567
    Debug.Print vbCr & "Comparative Results of Store-Only with " _
         & Format(lNumberOfItems, "#,###") & " items:"
        
    Call Test_Collection(lNumberOfItems, False)
    Call Test_Array_Known_Size(lNumberOfItems, False)
    Call Test_Array_Unknown_Size_Redim_Each(lNumberOfItems, False)
    Call Test_Array_Unknown_Size_Redim_Chunks(lNumberOfItems, False)
End Sub

Sub Compare_Times_Store_then_Read()
    Const lNumberOfItems As Long = 1234567
    Debug.Print vbCr & "Comparative Results of Store_then_Read with " _
         & Format(lNumberOfItems, "#,###") & " items:"
        
    'Call Test_Collection(lNumberOfItems, True) 'locks up if over ~200K items
    Call Test_Array_Known_Size(lNumberOfItems, True)
    Call Test_Array_Unknown_Size_Redim_Each(lNumberOfItems, True)
    Call Test_Array_Unknown_Size_Redim_Chunks(lNumberOfItems, True)
End Sub

Sub Test_Collection(lNumberOfItems As Long, bRead As Boolean)
    Dim dblStart As Double:   dblStart = Timer
    Dim colVals As New Collection
    Dim i As Long, lReadMe As Long
    
    On Error Resume Next
    For i = 1 To lNumberOfItems
        colVals.Add i * 2
    Next i

    If bRead Then
        For i = 1 To colVals.Count
            lReadMe = colVals(i)
        Next i
    End If
    Debug.Print "Collection: " _
      & Round((Timer - dblStart), 2) & " seconds"
End Sub

Sub Test_Array_Known_Size(lNumberOfItems As Long, bRead As Boolean)
    Dim dblStart As Double:   dblStart = Timer
    Dim lArray() As Long, lReadMe As Long, i As Long
    
    ReDim lArray(1 To lNumberOfItems)
    For i = 1 To lNumberOfItems
        lArray(i) = i * 2
    Next i

    If bRead Then
        For i = 1 To UBound(lArray)
            lReadMe = lArray(i)
        Next i
    End If
    Debug.Print "Array_Known_Size: " _
      & Round((Timer - dblStart), 2) & " seconds"

End Sub

Sub Test_Array_Unknown_Size_Redim_Each(lNumberOfItems As Long, bRead As Boolean)
    Dim dblStart As Double:   dblStart = Timer
    Dim i As Long, lReadMe As Long
    Dim lArray() As Long
    
    For i = 1 To lNumberOfItems
        ReDim Preserve lArray(1 To i)
        lArray(i) = i * 2
    Next i

    If bRead Then
        For i = 1 To UBound(lArray)
            lReadMe = lArray(i)
        Next i
    End If
    
    Debug.Print "Array_Unknown_Size_Redim_Each : " _
      & Round((Timer - dblStart), 2) & " seconds"

End Sub

Sub Test_Array_Unknown_Size_Redim_Chunks(lNumberOfItems As Long, bRead As Boolean)
    Dim dblStart As Double:   dblStart = Timer
    Dim lCounter As Long, i As Long, lReadMe As Long
    Dim lArray() As Long
    Const lChunkSize As Long = 10000
    
    ReDim lArray(1 To lChunkSize)
    Do While lCounter <> lNumberOfItems
        lCounter = lCounter + 1
        If lCounter > UBound(lArray) Then
          ReDim Preserve lArray(1 To UBound(lArray) + lChunkSize)
        End If
        lArray(lCounter) = lCounter * 2
    Loop
    ReDim Preserve lArray(1 To lCounter)
    If bRead Then
        For i = 1 To UBound(lArray)
            lReadMe = lArray(i)
        Next i
    End If
    
    Debug.Print "Array_Unknown_Size_Redim_Chunks : " _
      & Round((Timer - dblStart), 2) & " seconds"

End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,969
Messages
6,175,691
Members
452,667
Latest member
vanessavalentino83

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top