Using Excel to create matrices (mention in published journal for the method)

Epyle

New Member
Joined
May 5, 2010
Messages
14
Hello, <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
<o:p></o:p>
I am new to this forum but I require some expert help. I am a postgraduate student working on a social network analysis project and need to format some data in excel. The project will (hopefully) be published in a leading journal and anyone who provides me with an answer will get a mention in the acknowledgements.<o:p></o:p>
<o:p></o:p>
I am using the methods of social network analysis to create a visual network of authors of certain studies.<o:p></o:p>
The cleaned data looks like this:<o:p></o:p>

4581796684_4471d10d82_o.jpg


<?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" /><v:shape id=_x0000_i1025 style="WIDTH: 24pt; HEIGHT: 24pt" type="#_x0000_t75" alt=""></v:shape>
<o:p></o:p><o:p></o:p>
<o:p></o:p>
Some authors may have authored on many studies. Some authors may have only authored on one study. <o:p></o:p>
<o:p></o:p>
My proposed data outcome (an adjacency matrix) is shown in the image below.
<o:p></o:p>
<o:p></o:p>
What I want to know is how many times each author has authored <o:p></o:p>
with each other author and present it in a (square) matrix like this (example data):

<o:p>
4581166713_bffd603d18_o.jpg
</o:p>

<o:p></o:p>
<v:shape id=_x0000_i1026 style="WIDTH: 24pt; HEIGHT: 24pt" type="#_x0000_t75" alt=""></v:shape><o:p></o:p>
<o:p></o:p>
<o:p></o:p>
So in this example, Aghadavoudi O has authored with Abdulla A on 5 separate studies but never with Abdelaziz AB. <o:p></o:p>
<o:p></o:p>

I suspect that the incidence matrix may be derived (somehow) from an adjacency matrix which would look like this (again, fictitious data):<o:p></o:p>
<o:p></o:p>
<o:p></o:p>
<o:p>
4581166787_f5d9f8f522_o.jpg
</o:p>

<o:p></o:p>
<o:p></o:p>
<o:p></o:p>
<o:p></o:p>
In this matrix a 1 denotes that two authors have authored on the same paper. An 0 indicates that they have not. <o:p></o:p>
<o:p></o:p>

In total there are about 800 authors creating an 800X800 square matrix. Manually…this is just not workable.<o:p></o:p>
<o:p></o:p>

Any help or advice would be much appreciated…and I promise a published mention (if the project gets published). <o:p></o:p>
<o:p></o:p>

Thanks <o:p></o:p>
<o:p></o:p>
 

Excel Facts

Quick Sum
Select a range of cells. The total appears in bottom right of Excel screen. Right-click total to add Max, Min, Count, Average.
The code to produce the list would be posible, the problem being having the output in 800 colomns. I only have 256 - how many do you have ?

It seems unnecessary anyway considering that most of the cells will contain zero. Why do you want a grid ?

Another way would be to have a simple list - rather like files & folders eg ...
Code:
Person1    Person2 3
           Person4 1
           Person5 1
Person2    Person1 3
           Person6 1
           Person7 1
Person3    Person9 1
Person4    Person1 1
Person5    Person1 1
Person6    Person2 1
Person7    Person2 1
Person8
Person9    Person3 1
 
Person9    Person3 1

Another alternative would be a data table to list the names in pairs with eg. Person1 repeated down column 1 with column 2 containing another name for each co-operation (multiple if necessary). You could then use a pivot table to combine and count to get a result similar to the one above.
 
Upvote 0
Thanks for your respose BrianB.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
<o:p></o:p>
I'm using Excel 2007 and I think that it allows for more than 800 columns. The reason I require the data in a matrix form is because I plan to analyse it the software 'Pajek' which requires that data be imported in the form of a matrix. <o:p></o:p>
<o:p></o:p>
The software can import data in the form of of an incidence matrix (1= a connection, 0 no connection) or an adjacency matrix showing the 'strength' of connection an author has to another - where the number of times a specific author has authored with another is considered the 'strength'.

With social network analysis software I should (eventually) be able to visualise the network. Here is an example of author network:

figurenetwork.gif

<o:p></o:p>
If we look consider the top right of this example. In a matrix, the cell at the intersection between 'Rosenberg, NJ' and 'Rosenwig, C' would contain (at least) a 1 but the intersecting cell between 'Rosenberg, NJ' and 'Solecki, WD' would contain a 0.

So in summary, it really needs to be a matrix. Had a collegue have a go all this morning but it still doesnt seem to work. I can provide that data and comments she made (I dont really understand them as I'm not experience in Excel and formulas) if any one is interested- it might go some way to working out the problem.
 
Upvote 0
Hi, I produced this in response to you first thread.
Not sure if this will work for you, but I tried 100 columns x 100 rows and the code takes about 36 secs.(That = 100 names)
Have a go see what happens, I may be well off the mark !!
This code assumes your basic data as per your First picture in the thread is in sheet (1).
Run code from another Blank sheet.
NB:- The code should import all the Names for the Matrix and then the Count.
Code:
[COLOR=navy]Sub[/COLOR] MG06May48
[COLOR=navy]Dim[/COLOR] Rng [COLOR=navy]As[/COLOR] Range, Dn [COLOR=navy]As[/COLOR] Range, n [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] ShtRay, Nm [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] Sht [COLOR=navy]As[/COLOR] [COLOR=navy]Long[/COLOR]
[COLOR=navy]Dim[/COLOR] nRay, RwRay, p [COLOR=navy]As[/COLOR] [COLOR=navy]Integer,[/COLOR] Nmm [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] Rw [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] C [COLOR=navy]As[/COLOR] [COLOR=navy]Integer[/COLOR]
[COLOR=navy]Dim[/COLOR] t, Ac [COLOR=navy]As[/COLOR] [COLOR=navy]Integer[/COLOR]
t = Timer
ActiveSheet.Cells.ClearContents
[COLOR=navy]With[/COLOR] Sheets("Sheet1")
    [COLOR=navy]Set[/COLOR] Rng = .Range(.Range("A1"), .Range("A" & Rows.Count).End(xlUp)).Offset(, 1).Resize(, 250)
[COLOR=navy]End[/COLOR] With
ShtRay = Rng.Value
ReDim RwRay(1 To Rng.Count)
[COLOR=navy]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
 [COLOR=navy]For[/COLOR] [COLOR=navy]Each[/COLOR] Dn [COLOR=navy]In[/COLOR] Rng
    [COLOR=navy]If[/COLOR] Not .Exists(Dn.Value) And Not IsEmpty(Dn.Value) [COLOR=navy]Then[/COLOR] .Add Dn.Value, ""
 [COLOR=navy]Next[/COLOR]
nRay = .keys
  Range("B1").Resize(, .Count) = .keys
    Range("A2").Resize(.Count) = Application.Transpose(.keys)
      ReDim Fray(1 To .Count, 1 To .Count)
[COLOR=navy]End[/COLOR] With
[COLOR=navy]For[/COLOR] Nm = 0 To UBound(nRay)
    [COLOR=navy]For[/COLOR] Sht = 1 To Rng.Rows.Count
      [COLOR=navy]For[/COLOR] Ac = 1 To UBound(ShtRay, 2)
        [COLOR=navy]If[/COLOR] ShtRay(Sht, Ac) = nRay(Nm) [COLOR=navy]Then[/COLOR]
            p = p + 1
            RwRay(p) = Sht
            [COLOR=navy]Exit[/COLOR] For
          [COLOR=navy]End[/COLOR] If
      [COLOR=navy]Next[/COLOR] Ac
   [COLOR=navy]Next[/COLOR] Sht
 
For Nmm = Nm + 1 To UBound(nRay) 
For Rw = 1 To Rng.Rows.Count 
    [COLOR=navy]For[/COLOR] Ac = 1 To UBound(ShtRay, 2)
        [COLOR=navy]If[/COLOR] ShtRay(Rw, Ac) = nRay(Nmm) [COLOR=navy]Then[/COLOR]
           C = C + 1
            [COLOR=navy]Exit[/COLOR] For
          [COLOR=navy]End[/COLOR] If
      [COLOR=navy]Next[/COLOR] Ac
   [COLOR=navy]Next[/COLOR] Rw
Fray(Nm + 1, Nmm + 1) = C
Fray(Nmm + 1, Nm + 1) = C
C = 0
[COLOR=navy]Next[/COLOR] Nmm
[COLOR=navy]Next[/COLOR] Nm
Range("B2").Resize(UBound(nRay) + 1, UBound(nRay) + 1) = Fray
MsgBox Timer - t
[COLOR=navy]End[/COLOR] [COLOR=navy]Sub[/COLOR]
Regards Mick
 
Last edited:
Upvote 0
Hi, The previous code Has some errors, Try this, I did 100 columns by 250 rows in 101 secs.
Code:
[COLOR=navy]Sub[/COLOR] MG06May46
[COLOR=navy]Dim[/COLOR] Rng [COLOR=navy]As[/COLOR] Range, Dn [COLOR=navy]As[/COLOR] Range, n [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] ShtRay, Nm [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] Sht [COLOR=navy]As[/COLOR] [COLOR=navy]Long[/COLOR]
[COLOR=navy]Dim[/COLOR] nRay, RwRay, p [COLOR=navy]As[/COLOR] [COLOR=navy]Integer,[/COLOR] Nmm [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] Rw [COLOR=navy]As[/COLOR] [COLOR=navy]Long,[/COLOR] C [COLOR=navy]As[/COLOR] [COLOR=navy]Integer[/COLOR]
[COLOR=navy]Dim[/COLOR] t, Ac [COLOR=navy]As[/COLOR] [COLOR=navy]Integer[/COLOR]
t = Timer
ActiveSheet.Cells.ClearContents
[COLOR=navy]With[/COLOR] Sheets("Sheet1")
    [COLOR=navy]Set[/COLOR] Rng = .Range(.Range("A1"), .Range("A" & Rows.Count).End(xlUp)).Offset(, 1).Resize(, 250)
[COLOR=navy]End[/COLOR] With
ShtRay = Rng.Value
ReDim RwRay(1 To Rng.Count)
[COLOR=navy]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
 [COLOR=navy]For[/COLOR] [COLOR=navy]Each[/COLOR] Dn [COLOR=navy]In[/COLOR] Rng
    [COLOR=navy]If[/COLOR] Not .Exists(Dn.Value) And Not IsEmpty(Dn.Value) [COLOR=navy]Then[/COLOR] .Add Dn.Value, ""
 [COLOR=navy]Next[/COLOR]
nRay = .keys
  Range("B1").Resize(, .Count) = .keys
    Range("A2").Resize(.Count) = Application.Transpose(.keys)
      ReDim Fray(1 To .Count, 1 To .Count)
[COLOR=navy]End[/COLOR] With
[COLOR=navy]For[/COLOR] Nm = 0 To UBound(nRay)
    ReDim RwRay(1 To Rng.Count)
    [COLOR=navy]For[/COLOR] Sht = 1 To Rng.Rows.Count
      [COLOR=navy]For[/COLOR] Ac = 1 To UBound(ShtRay, 2)
        [COLOR=navy]If[/COLOR] ShtRay(Sht, Ac) = nRay(Nm) [COLOR=navy]Then[/COLOR]
            p = p + 1
            RwRay(p) = Sht
            [COLOR=navy]Exit[/COLOR] For
          [COLOR=navy]End[/COLOR] If
      [COLOR=navy]Next[/COLOR] Ac
   [COLOR=navy]Next[/COLOR] Sht
   ReDim Preserve RwRay(1 To p)
[COLOR=navy]For[/COLOR] Nmm = Nm + 1 To UBound(nRay)
[COLOR=navy]For[/COLOR] Rw = 1 To UBound(RwRay)
    [COLOR=navy]For[/COLOR] Ac = 1 To UBound(ShtRay, 2)
        [COLOR=navy]If[/COLOR] ShtRay(RwRay(Rw), Ac) = nRay(Nmm) [COLOR=navy]Then[/COLOR]
           C = C + 1
            [COLOR=navy]Exit[/COLOR] For
          [COLOR=navy]End[/COLOR] If
      [COLOR=navy]Next[/COLOR] Ac
   [COLOR=navy]Next[/COLOR] Rw
Fray(Nm + 1, Nmm + 1) = C
Fray(Nmm + 1, Nm + 1) = C
C = 0
p = 0
[COLOR=navy]Next[/COLOR] Nmm
[COLOR=navy]Next[/COLOR] Nm
Range("B2").Resize(UBound(nRay) + 1, UBound(nRay) + 1) = Fray
MsgBox Timer - t
[COLOR=navy]End[/COLOR] [COLOR=navy]Sub[/COLOR]
Regards Mick
 
Last edited:
Upvote 0
Here is my method. Although it seems OK I suggest you check with a small sample to make sure. Having 2 methods to cross-check is beneficial.

Experience suggests that there will be typing errors in the data. My code will match upper/lower case differences but will treat differences in spelling and spaces as different names.


Code:
'=============================================================================
'- ADD DATA TO MATRIX
'- Original List of studies column A. Author list starts in column B.
'- Adds a new worksheet with the matrix
'- Method
'- 1. Set up original matrix from column B authors 
'- 2. Iterate column B authors onwards & update matrix (adds new names)
'- A single row A - B - C produces matrix ....
'        A   B   C
'    A       1   1
'    B   1       1
'    C   1   1
'- Brian Baulsom May 2010
'=============================================================================
'- Original List
Dim Fromsheet As Worksheet
Dim MainCol As Integer
Dim FromRow As Long
Dim FromCol As Integer
Dim LastRow As Long         ' initial setup rows
Dim LastCol As Integer      ' initial setup last used column
Dim Author1 As String
Dim Author2 As String
Dim AuthorNum As Integer    ' subroutine
Dim Author1Num As Integer
Dim Author2Num As Integer
'- Matrix Grid
Dim ToSheet As Worksheet
Dim ToRow As Long
Dim ToCol As Integer
Dim AuthorCount As Integer      ' number of authors. limits Find
Dim AuthorList As Range         ' column of names
Dim MatrixCell As Range
'-
Dim FoundCell As Range
Dim msg As String               ' final message
Dim OriginalNames As Integer    ' number of names in original list
Dim GridTotalMatches As Long
Dim rg As String                ' range address
'-----------------------------------------------------------------------------
 
'=============================================================================
'- MAIN ROUTINE
'=============================================================================
Sub MAKE_MATRIX()
    '- INITIALISE VARIABLES
    Set Fromsheet = Worksheets("sheet1")        'ActiveSheet
    FromRow = 1
    With Fromsheet
        LastRow = .Range("A65536").End(xlUp).Row
        '- COUNT NUMBER OF NAMES
        LastCol = .Cells.Find(what:="*", SearchDirection:=xlPrevious, _
            SearchOrder:=xlByColumns).Column
        '-------------------------------------------------------------------------
        rg = Range(Cells(1, "B"), Cells(LastRow, LastCol)).Address
        OriginalNames = _
        Application.WorksheetFunction.CountA(.Range(rg))
    End With
    '------------------------------------------------------------------------
    Application.Calculation = xlCalculationManual
    '------------------------------------------------------------------------
    Set ToSheet = Worksheets.Add
    ToSheet.UsedRange.Cells.ClearContents
    Set AuthorList = ToSheet.Range("A1:A3")
    GridTotalMatches = 0
    '=========================================================================
    '- SET UP MATRIX FROM FIRST COLUMN OF AUTHORS (EXCLUDES DUPLICATES)
    '=========================================================================
    AuthorCount = 0
    For FromRow = 1 To LastRow
        Author1 = Fromsheet.Cells(FromRow, 2).Value
        CHECK_AUTHOR Author1           ' subroutine
    Next
    '=========================================================================
    '- ADD NUMBERS TO THE MATRIX
    '- Check each column
    '=========================================================================
    With Fromsheet
    For MainCol = 2 To LastCol
        For FromRow = 1 To LastRow
            Application.StatusBar = " Checking Column : " & MainCol _
                & "\" & LastCol & "      Row : " & FromRow & "\" & LastRow
            '-----------------------------------------------------------------
            '- AUTHOR 1
            Author1 = CStr(.Cells(FromRow, MainCol))
            If Author1 <> "" Then
                CHECK_AUTHOR Author1
                Author1Num = AuthorNum
                '-------------------------------------------------------------
                '- AUTHOR 2
                For FromCol = MainCol + 1 To LastCol
                    Author2 = CStr(.Cells(FromRow, FromCol))
                    If Author2 <> "" Then
                        CHECK_AUTHOR Author2
                        Author2Num = AuthorNum
                        '-----------------------------------------------------
                        '- UPDATE MATRIX
                        GridTotalMatches = GridTotalMatches + 2
                        '- Author 1
                        Set MatrixCell = ToSheet.Cells(Author1Num + 1, Author2Num)
                        MatrixCell.Value = MatrixCell.Value + 1
                        '- Author 2
                        Set MatrixCell = ToSheet.Cells(Author2Num + 1, Author1Num)
                        MatrixCell.Value = MatrixCell.Value + 1
                        '-----------------------------------------------------
                    End If
                Next FromCol
                '-------------------------------------------------------------
            End If
            '-----------------------------------------------------------------
        Next FromRow
    Next MainCol
    End With
    '=========================================================================
    '- FINISH
    msg = "Done" & vbCr _
        & "Original Names = " & Format(OriginalNames, "##,###") & vbCr _
        & "Total Authors    = " & Format(AuthorCount, "##,###") & vbCr _
        & "Total Matches    = " & Format(GridTotalMatches, "###,###")
    MsgBox (msg)
    Application.Calculation = xlCalculationAutomatic
    Application.StatusBar = False
    '-------------------------------------------------------------------------
End Sub
'=========== end of main routine =============================================
 
'=============================================================================
'- SUBROUTINE : CHECK AUTHOR. ADD NEW TO THE MATRIX GRID
'=============================================================================
Private Sub CHECK_AUTHOR(Author)
    With ToSheet
        '---------------------------------------------------------------------
        '- SET VARIABLES
        Set FoundCell = AuthorList.Cells.Find(what:=Author, MatchCase:=False)
        '--------------------------------------------------------------------
        '- CHECK NEW AUTHOR
        If FoundCell Is Nothing Then
            AuthorCount = AuthorCount + 1
            ToRow = AuthorCount + 2
            ToCol = AuthorCount + 1
            '----------------------------------------------------------------
            '- ADD AUTHOR
            .Cells(ToRow, 1).Value = Author  ' column 1
            .Cells(2, ToCol).Value = Author  ' row 1
            .Cells(ToRow, ToCol).Value = "--------"
            Set AuthorList = .Range("A1:A" & .Range("A65536").End(xlUp).Row)
            '----------------------------------------------------------------
            AuthorNum = ToRow - 1       ' for Author 2 positions
        Else
            AuthorNum = FoundCell.Row - 1
        End If
        '-------------------------------------------------------------------
    End With
End Sub
'=========== end of sub routine =============================================
 
Upvote 0
I want to thank BrianB and particularly MickG (whose method I have chosen to use) for helping me with this project. You will both be acknowedged in my project. Message me with how you would like to be known if you wish to be acknowledged.

I really appreciate responses (both in the thread and messages) about the method.

E
 
Upvote 0
Hi, I would be interested to know if you got the code to run for all that data.
When I tried it on 250 x 250 Columns/rows. My computer froze after about the 6th loop.
NB:- If you change the last few lines of code as shown below, the Sheet Status Bar will be updated for each code loop completed out of the Total Number of Loops in the code.This will Give you some idea of where the code is while its running,
Code:
Next Nmm
Application.StatusBar = "Loop " & Nm & " of " & UBound(nRay)
Next Nm
Range("B2").Resize(UBound(nRay) + 1, UBound(nRay) + 1) = Fray
Application.StatusBar = False
MsgBox Timer - t
End Sub
Regards Mick
 
Upvote 0
I just tried it on my full data - 1004 studies. This included 3000 authors creating a matrix 3000 X 3000. Not sure how long it took as I left it working over lunch, however when i ran the code on a smaller section yesterday, it took about 7 minutes to create a 1075 X 1075 matrix.

I am delighted with the results.

E
 
Upvote 0

Forum statistics

Threads
1,226,564
Messages
6,191,771
Members
453,679
Latest member
aalligood

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top