Collect Spreadsheet Data

Mr. Wizard

Board Regular
Joined
Sep 28, 2005
Messages
67
I need to collect analytical data for my organization about the usage of specific Excel functions.

I would like to write a macro that would go to a specified drive location, and open each folder and sub folder, if there are Excel spreadsheets, the macro would open the spreadsheet, and get a total number of usages of for every Excel function used on every worksheet including nested functions and then close the workbook and continue on to the next workbook collecting data until there are no more workbooks found on the server.

I have seen macros that go through and turn on the function view instead of the results view, but I need to collect data about how frequently people are using the functions, and sifting through formulas manually keeping a tally and counting seems very inefficient.

Would someone be able to provide me with code to get the Pareto data for one workbook? I can program the rest of the process myself.

Eventually what I would like is a list of Excel Functions found with the total count of instances of the function.

Thanks for your assistance,
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
This might get you started. It's a very raw version, so you might have to do some work with it, or post back here.

First, I collected a list of all Excel worksheet functions. I got the list from here: Excel functions (alphabetical list) - Excel - Office.com. I made the assumption that any time a function is used, it would have an opening parenthesis next to it. I did some trickery with a =LEFT(A1, FIND(" ", A1)-1)&"(" formula and copy/pasted values into Sheet2. So, my function list looks like this, for the first few rows of Sheet2:

A
ABS(
ACCRINT(
ACCRINTM(
ACOS(
ACOSH(
ADDRESS(

<tbody>
[TD="bgcolor: #cacaca, align: center"]1[/TD]

[TD="bgcolor: #cacaca, align: center"]2[/TD]

[TD="bgcolor: #cacaca, align: center"]3[/TD]

[TD="bgcolor: #cacaca, align: center"]4[/TD]

[TD="bgcolor: #cacaca, align: center"]5[/TD]

[TD="bgcolor: #cacaca, align: center"]6[/TD]

</tbody>


Then, I wrote this code. The first constant is my test file. The rest is information about my sheet where the functions are listed. I hard-coded the range and the number of functions, so change that as appropriate for your purposes.

Code:
Private Const sFILE As String = "C:\Users\Homie\Desktop\440_Budget_Template_v1.xlsx"
Private Const sLOG_SHEET As String = "Sheet2"
Private Const sFUNC_LIST As String = "A1:A337"
Private Const sFUNC_COUNT As Long = 337

Public Sub CountFunctions()
  Dim wkb As Excel.Workbook
  Dim wsh As Excel.Worksheet
  
  Dim rngUsed As Excel.Range
  Dim rngFind As Excel.Range
  
  Dim sFirstFind As String
  
  Dim vFunc As Variant
  Dim vOut As Variant
  
  Dim i As Long
  Dim calcs As Excel.XlCalculation
  
  Dim iCells As Long, iProgress As Long
  
  vFunc = ThisWorkbook.Worksheets(sLOG_SHEET).Range(sFUNC_LIST).Value
  ReDim vOut(1 To sFUNC_COUNT)
  
  Application.ScreenUpdating = False
  calcs = Application.Calculation
  Application.Calculation = xlCalculationManual


  Set wkb = Application.Workbooks.Open(sFILE, UpdateLinks:=False, ReadOnly:=True)
  
  For Each wsh In wkb.Worksheets
    iCells = iCells + wsh.UsedRange.Cells.Count
  Next wsh
  
  For Each wsh In wkb.Worksheets
    Debug.Print Now(), "Processing " & wsh.Name & " in " & wkb.Name
    Set rngUsed = wsh.UsedRange
    For i = 1 To sFUNC_COUNT
      On Error Resume Next
        Set rngFind = rngUsed.Find(what:=vFunc(i, 1), LookIn:=xlFormulas, lookat:=xlPart)
      On Error GoTo 0
      
      If Not (rngFind Is Nothing) Then
        sFirstFind = rngFind.Address
        Do
          If rngFind.HasFormula Then
            vOut(i) = vOut(i) + (Len(rngFind.Formula) - Len(Replace(rngFind.Formula, vFunc(i, 1), ""))) / Len(vFunc(i, 1))
          End If
          Set rngFind = rngUsed.FindNext(rngFind)
          If rngFind Is Nothing Then Set rngFind = wsh.Range(sFirstFind)
        Loop Until StrComp(sFirstFind, rngFind.Address) = 0
      End If
    Next i
    iProgress = iProgress + rngUsed.Cells.Count
    Application.StatusBar = "Counting worksheet functions, percent completed: " & Format(iProgress / iCells, "0.00%")
  Next wsh
  
  ThisWorkbook.Worksheets(sLOG_SHEET).Range(sFUNC_LIST).Offset(0, 1).Value = _
        Application.WorksheetFunction.Transpose(vOut)
        
  Application.ScreenUpdating = True
  Application.Calculation = xlCalculationAutomatic
  Application.StatusBar = False
        
  Debug.Print "Done at " & Now()
  
  If StrComp(ThisWorkbook.Name, wkb.Name) <> 0 Then wkb.Close savechanges:=False
End Sub

The code will populate the column adjacent to the right of the function list.

I then copied the output and filtered it, here is the result I got:

AB
AVERAGE(
COUNTIF(
HLOOKUP(
IF(
IFERROR(
LOOKUP(
MONTH(
OR(
ROUND(
SUM(
SUMIF(
SUMIFS(
VALUE(
VLOOKUP(

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:164px;"><col style="width:64px;"></colgroup><tbody>
[TD="bgcolor: #cacaca, align: center"]1[/TD]
[TD="align: center"]Function[/TD]
[TD="align: center"]Count[/TD]

[TD="bgcolor: #cacaca, align: center"]19[/TD]

[TD="align: right"]2[/TD]

[TD="bgcolor: #cacaca, align: center"]57[/TD]

[TD="align: right"]139[/TD]

[TD="bgcolor: #cacaca, align: center"]143[/TD]

[TD="align: right"]2400[/TD]

[TD="bgcolor: #cacaca, align: center"]147[/TD]

[TD="align: right"]92695[/TD]

[TD="bgcolor: #cacaca, align: center"]148[/TD]

[TD="align: right"]2319[/TD]

[TD="bgcolor: #cacaca, align: center"]189[/TD]

[TD="align: right"]12178[/TD]

[TD="bgcolor: #cacaca, align: center"]206[/TD]

[TD="align: right"]2800[/TD]

[TD="bgcolor: #cacaca, align: center"]231[/TD]

[TD="align: right"]6519[/TD]

[TD="bgcolor: #cacaca, align: center"]262[/TD]

[TD="align: right"]2403[/TD]

[TD="bgcolor: #cacaca, align: center"]290[/TD]

[TD="align: right"]26409[/TD]

[TD="bgcolor: #cacaca, align: center"]291[/TD]

[TD="align: right"]7987[/TD]

[TD="bgcolor: #cacaca, align: center"]292[/TD]

[TD="align: right"]36[/TD]

[TD="bgcolor: #cacaca, align: center"]320[/TD]

[TD="align: right"]14[/TD]

[TD="bgcolor: #cacaca, align: center"]326[/TD]

[TD="align: right"]9778[/TD]

</tbody>


Excel Tabellen im Web darstellen >> Excel Jeanie HTML 4.8



There are a number of ways you can speed up this algorithm. Possibly there are functions you don't care about - exclude them from the list. Also, because it uses UsedRange attribute of the worksheet, if the worksheets in each workbook are cleaned up it will be faster - I know that probably sounds impossible, but with some training the users could do it.

So, with my test - this is a huge workbook, nearly 4MB in size, 50 worksheets, and the number of formulas you see above - this whole operation took about 4 minutes to run, on one file. If you have many large files, you might want to do it overnight. Unless someone else chimes in with significant speed improvements (I implemented everything I could think of off top of my head), this won't be a quick task. I know at my company we have hundreds of thousands of them, and many are of this kind of size and complexity. You should probably consider doing selective sampling, on a few folders, representative of your information needs.

Just some ideas for moving forward with this.

But anyway - hope you find my input helpful, I definitely had fun developing it, and it's an interesting problem for which I now see my own applications! Thanks for sharing, good luck, and post back if you need anything.
 
Upvote 0
Thanks very much for your assistance. After some tweeking I was able to get the code set up and working. I will get the constant filename replaced with a variable so that it will cycle through the worksheets I want to query. At what point does the code write the values into the cells on Sheet 2? I will need to update the code so that it adds the values for each additional worksheet to the data already in the cell.


Thanks!
 
Upvote 0
I suppose I could have written better comments. This is the line that puts the formula counts into the sheet:

Code:
  ThisWorkbook.Worksheets(sLOG_SHEET).Range(sFUNC_LIST).Offset(0, 1).Value = _
        Application.WorksheetFunction.Transpose(vOut)

The way I envisioned it working when I first read your description, is it would be a function that takes a file name, and returns the vOut to the calling procedure. So you would refactor the part where the formula list is acquired, the file is opened, and results are written back to the workbook. Or, you could keep a running vOut (global or passed ByRef) and then populate all the way at the end. In the second approach, you ran the risk of crashing the macro mid-way (on purpose or due to some external issue) and not retaining any intermediate results.

In my own mind, I was taking the concept one step further. I was thinking about selecting a list of folders, representative of each area of the organization, and using that list to pull all those folders. The reason is, the example I ran in my first post has something like 150,000 functions. But a lot of these are repetitive. And there are thousands of these files. So really and truly, I only want to count one of them, because they're identical and majority of the functions don't actually see much use.

So if doing a sample instead, I'd want to pick several dozen files to analyze - different models, from different departments, for different purposes - and keep track of statistics, such as which department uses which functions more frequently.
 
Upvote 0
Thank you for responding. You are right on target with what I am attempting to do. I am analyzing how we use the tools between departments and identifying core functions that one must have to work effectually in the departments so that we can train people coming in.

I will keep working on the process for analyzing the spreadsheet and determining an appropriate sample.

Best regards,
 
Upvote 0
I found a couple of quirks that break the process, and I thought it would be good to post these.

1. Spreadsheets that open to a chart tab break the code.
2. Spreadsheets without any formulas do not report that there are no formulas, and when the code was inserted into a loop to examine a series of spreadsheets, the resulting report generates the same data for a spreadsheet without formulas as the previous report that had formulas.
 
Upvote 0

Forum statistics

Threads
1,223,268
Messages
6,171,100
Members
452,379
Latest member
IainTru

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top