VBA for making a list to extract Unique Cell References from an entire Workbook - REGEXP

actjfc

Active Member
Joined
Jun 28, 2003
Messages
416
The following code extracts all references one cell at a time with a function. Somehow it works fine for my project. However, I need almost the same but applied to ALL cells within a whole workbook. So, the code should add a new sheet and make a list with one row per unique reference found showing the sheet where the formula is located, the cell address, the actual formula, and the cell reference itself. If a formula has several references will be repeated in several rows, but the if a reference is several times within the same formula should only be in the list one time per cell/formula.

For example: if the cell B5 has this formula sum(A1, A1*A2, A3) in the sheet1, should be listed as:

Sheet1 B5 sum(A1,A1*A2,A3) A1
Sheet1 B5 sum(A1,A1*A2,A3) A2
Sheet1 B5 sum(A1,A1*A2,A3) A3

The current formula output is just: A1, A1, A2, A3

Can somebody modify this VBA code to produce this workbook reference list? "No matches" should be not listed. Thanks!

Function ExtractCellRefs(Rg As Range) As String
'Updateby Extendoffice
Dim xRetList As Object
Dim xRegEx As Object
Dim I As Long
Dim xRet As String
Application.Volatile
Set xRegEx = CreateObject("VBSCRIPT.REGEXP")
With xRegEx
.Pattern = "('?[a-zA-Z0-9\s\[\]\.]{1,99})?'?!?\$?[A-Z]{1,3}\$?[0-9]{1,7}(:\$?[A-Z]{1,3}\$?[0-9]{1,7})?"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set xRetList = xRegEx.Execute(Rg.Formula)
If xRetList.Count > 0 Then
For I = 0 To xRetList.Count - 1
xRet = xRet & xRetList.Item(I) & ", "
Next
ExtractCellRefs = Left(xRet, Len(xRet) - 2)
Else
ExtractCellRefs = "No Matches"
End If
End Function

Source: 'https://www.extendoffice.com/documents/excel/4555-excel-extract-cell-reference-from-formula.html
 
Last edited:

Excel Facts

Lock one reference in a formula
Need 1 part of a formula to always point to the same range? use $ signs: $V$2:$Z$99 will always point to V2:Z99, even after copying
1. When posting code, please use indented code and Code Tags to preserve that indentation as without that the code is much harder to read & debug. My signature block below explains how to do it.

2. I expect that this reference extraction will be extremely hard to be 100% accurate. Further to this point, I have not attempted to 'proofread' the RegExp pattern and have accepted that you are happy (enough) with it. One example that I have found already is that it seems to struggle with whole column or row references. eg =INDEX(E:H,2,2) or =VLOOKUP(3,A:B,2,1). Another one is that from the formula ="abc" & B2 it extracts " B2". I don't know if that leading space could be an issue for you or not.

3. For the moment at least, I have assumed that you will manually add a worksheet to receive the references list, name it 'References' (or adjust my code) and enter headings in A1:D1. Each time my code is run it will clear everything below those headings and then populate rows 2 and below with the results.

4. I have assumed no empty worksheets in the workbook.

Anyway, you can give this a try in a copy of your workbook after setting up the 'References' sheet.

Code:
Sub Log_References()
  Dim RX As Object, M As Object, d As Object
  Dim ws As Worksheet
  Dim data As Variant, itm As Variant
  Dim results() As String
  Dim i As Long, j As Long, k As Long, rw As Long
  Dim wsName As String
  
  Set RX = CreateObject("VBScript.Regexp")
  Set d = CreateObject("Scripting.Dictionary")
  ReDim results(1 To Rows.Count, 1 To 4) As String
  With RX
    .Global = True
    .MultiLine = True
    .IgnoreCase = False
    .Pattern = "('?[a-zA-Z0-9\s\[\]\.]{1,99})?'?!?\$?[A-Z]{1,3}\$?[0-9]{1,7}(:\$?[A-Z]{1,3}\$?[0-9]{1,7})?"
    For Each ws In Worksheets
      wsName = ws.Name
      If wsName <> "References" Then
        data = ws.Range("A1", ws.UsedRange.SpecialCells(xlLastCell)).Formula
        For i = 1 To UBound(data, 1)
          For j = 1 To UBound(data, 2)
            If Left(data(i, j), 1) = "=" Then
              Set M = .Execute(data(i, j))
              If M.Count > 0 Then
                d.RemoveAll
                For Each itm In M
                  If Not d.exists(itm) Then
                    d(itm) = 1
                    rw = rw + 1
                    results(rw, 1) = wsName
                    results(rw, 2) = Cells(i, j).Address(0, 0)
                    results(rw, 3) = data(i, j)
                    results(rw, 4) = itm
                  End If
                Next itm
              End If
            End If
          Next j
        Next i
      End If
    Next ws
  End With
  With Sheets("References")
    .UsedRange.Offset(1).ClearContents
    .Range("A2:D2").Resize(rw).Value = results
    .Columns("A:D").AutoFit
  End With
End Sub
 
Last edited:
Upvote 0
Amazing! It works perfect for what I need! You are right, it may miss some whole column references, but I am more than happy.

Huge THANKS again!
 
Upvote 0
Peter_SSs,

You were totally right! Now, I revisited the code because I need to identify just the functions that you mentioned type eg =INDEX(E:H,2,2) or =VLOOKUP(3,A:B,2,1). I inherited a model that has the potential to become very slow. It has about 20 sheets and 15000 formulas. I must locate all the formulas referencing entire columns. Just like your example. I tested the Pattern = "[a-zA-Z]{1}:[a-zA-Z]{1}" at http://regexr.com, but I cannot make the above macro to work. Can you help me? Any help is highly appreciated. Thanks!
 
Last edited:
Upvote 0
I must locate all the formulas referencing entire columns.
For whole columns only, try this pattern. It is possible, but highly unlikely, it could identify some false positives but they would be easily identifiable by looking at the 'Formula' column in the References sheet. I haven't tested it extensively to see if there are some column references that it fails to identify.
Code:
.Pattern = "\$?[A-Z]{1,3}:\$?[A-Z]{1,3}"
 
Last edited:
Upvote 0
Thanks Peter_SSs! It works perfect! Great help!
One possible problem for you to look out for with the code Peter posted in Message #2 ... it will misidentify any function name with a number in its name as a cell reference. For example, if the ATAN2 function is in a cell, the code will report ATAN2 as a cell reference... also if the BIN2DEC function is in a cell, the code will return BIN2 as a cell reference.
 
Last edited:
Upvote 0
Thanks for your comment. The workbook that I inherited has 15000 cells with formulas, but fortunately only 200 unique cells using whole column references. Some of them easy to fix by copy and paste ir down a new formula. Now, I am certain I did nor miss any cell. Thanks!
 
Upvote 0

Forum statistics

Threads
1,223,227
Messages
6,170,849
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top