K-Mean Clustering Code is not working

wklow83

New Member
Joined
Feb 14, 2017
Messages
7
Hi guys,

I got the codes from online resources, when i run it to do k-means clustering, it seems don't work as it should be (it should be a 6 columns x 4 rows table, but it only returns 5 columns x 3 rows table).

I have data in datasheet (worksheet) in 6 columns x 33 rows table.
Then, i have initial 4 centroids coordinates in activesheet G3:L6.

I want to return the results (new calculated centroid coordinates paste at range G7 as i indicated in the worksheet) by calculating from the datasheet data range (as i indicated int the worksheet too). Below are my parameter in the same activesheet to run the k-means clustering

[TABLE="width: 238"]
<colgroup><col><col></colgroup><tbody>[TR]
[TD]MaxIt[/TD]
[TD="align: right"]150[/TD]
[/TR]
[TR]
[TD]Data Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Data Range[/TD]
[TD]J6:O38[/TD]
[/TR]
[TR]
[TD]Output Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Output Range[/TD]
[TD]P6[/TD]
[/TR]
[TR]
[TD]Initial Centr[/TD]
[TD]G3:L6[/TD]
[/TR]
[TR]
[TD]Output Centr[/TD]
[TD]G7[/TD]
[/TR]
</tbody>[/TABLE]


I am not a originator of the code, but i would like to learn from code and know how to make it works. the codes are as follows:



Rich (BB code):
Rich (BB code):
Sub kmeans()
'x main macro x'


'x get the initial centroid data


Dim InitCentrRange As String:   InitCentrRange = ActiveSheet.Range("C7").Value
Dim InitialCentroids As Variant: InitialCentroids = ActiveSheet.Range(InitCentrRange)


'x get the number of max iterations


Dim MaxIt As Integer:   MaxIt = ActiveSheet.Range("C2").Value


'x get the features data


Dim DataSht As String: DataSht = ActiveSheet.Range("C3").Value
Dim DataRange As String: DataRange = ActiveSheet.Range("C4").Value
Dim X As Variant: X = Worksheets(DataSht).Range(DataRange)


'x assign objects to centroids based on initial centroids


Dim my_idx As Variant: my_idx = FindClosestCentroid(X, InitialCentroids)


Dim J As Integer:   J = UBound(X, 2) 'number of features
Dim K As Integer:   K = UBound(InitialCentroids, 1) 'number of clusters
Dim M As Integer:   M = UBound(X, 1) 'number of objects


Dim centroids As Variant
Dim ii As Integer: ii = 1
For ii = 1 To MaxIt
    centroids = ComputeCentroids(X, my_idx, K)
    my_idx = FindClosestCentroid(X, centroids)
Next ii


'x print the centroids


Dim outputRange As String:  outputRange = ActiveSheet.Range("C8").Value
Range(outputRange).Resize(K, J).Value = centroids


'x print the cluster assignment next to data


Dim ClusterOutputSht As String: ClusterOutputSht = ActiveSheet.Range("C5").Value
Dim ClusterOutputRange As String: ClusterOutputRange = ActiveSheet.Range("C6").Value
Worksheets(ClusterOutputSht).Range(ClusterOutputRange).Resize(M, 1).Value = WorksheetFunction.Transpose(my_idx)




End Sub



Rich (BB code):
Public Function FindClosestCentroid(ByRef X As Variant, ByRef centroids As Variant) As Variant


'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function loops through all observations and calculate the distance to all centroids  x'
'x Function then returns an array with all the nearest centroids for each observation   x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'


Dim K As Integer:   K = UBound(centroids, 1)
Dim J As Integer:   J = UBound(centroids, 2)
Dim M As Integer:   M = UBound(X, 1)
Dim idx() As Variant:   ReDim idx(M) As Variant
Dim ii As Integer:  ii = 1
Dim cc As Integer:  cc = 1


'For eahc observation find the closest centroid


For ii = 1 To M
    Dim Dist_min As Double: Dist_min = 1000000
    Dim Dist As Double: Dist = 0
    'for each K calculate the distance
    For cc = 1 To K
        Dist = EuclideanDistance(Application.Index(X, ii, 0), Application.Index(centroids, cc, 0), J)
        If Dist < Dist_min Then
            idx(ii) = cc
            Dist_min = Dist
        End If
    Next cc
Next ii


FindClosestCentroid = idx()


End Function


Rich (BB code):
Public Function ComputeCentroids(X As Variant, idx As Variant, K As Variant) As Variant


'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function that calculates the new centroid mean for each feature                      x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'


Dim M As Integer:   M = UBound(X, 1) 'number of objects
Dim J As Integer:   J = UBound(X, 2) 'number of features
Dim ii As Integer:  ii = 1
Dim cc As Integer:  cc = 1
Dim bb As Integer:  bb = 1
Dim counter As Integer: counter = 0
Dim centroids() As Variant: ReDim centroids(K, J) As Variant
Dim tempSum() As Variant


For ii = 1 To K 'for each centroid
    For bb = 1 To J 'for each feature
        counter = 0 'reset counter
        For cc = 1 To M 'for each observation
            If idx(cc) = ii Then 'for objects that below to this centroid calc the sum
                centroids(ii, bb) = centroids(ii, bb) + X(cc, bb)
                counter = counter + 1
            End If
        Next cc
        If counter > 0 Then     'modified
        centroids(ii, bb) = centroids(ii, bb) / counter 'divide sum by counter to get mean
        Else                    'modified
        centroids(ii, bb) = 0   'modified
        End If                  'modified
    Next bb
Next ii


ComputeCentroids = centroids


End Function


Rich (BB code):
Public Function EuclideanDistance(X As Variant, Y As Variant, num_obs As Integer) As Double


'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function that calculates the euclidean distance between two vectors of size num_obs each    x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'


Dim ii As Integer: ii = 1
Dim RunningSumSqr As Double: RunningSumSqr = 0
For ii = 1 To num_obs
    RunningSumSqr = RunningSumSqr + (X(ii) - Y(ii)) ^ 2
Next ii


EuclideanDistance = Sqr(RunningSumSqr)


End Function


I appreciate any help that i could have. Thank you in advance!


Thanks in advance,
Wye
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.

Forum statistics

Threads
1,225,730
Messages
6,186,701
Members
453,369
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top