Hi guys,
I got the codes from online resources, when i run it to do k-means clustering, it seems don't work as it should be (it should be a 6 columns x 4 rows table, but it only returns 5 columns x 3 rows table).
I have data in datasheet (worksheet) in 6 columns x 33 rows table.
Then, i have initial 4 centroids coordinates in activesheet G3:L6.
I want to return the results (new calculated centroid coordinates paste at range G7 as i indicated in the worksheet) by calculating from the datasheet data range (as i indicated int the worksheet too). Below are my parameter in the same activesheet to run the k-means clustering
[TABLE="width: 238"]
<colgroup><col><col></colgroup><tbody>[TR]
[TD]MaxIt[/TD]
[TD="align: right"]150[/TD]
[/TR]
[TR]
[TD]Data Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Data Range[/TD]
[TD]J6:O38[/TD]
[/TR]
[TR]
[TD]Output Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Output Range[/TD]
[TD]P6[/TD]
[/TR]
[TR]
[TD]Initial Centr[/TD]
[TD]G3:L6[/TD]
[/TR]
[TR]
[TD]Output Centr[/TD]
[TD]G7[/TD]
[/TR]
</tbody>[/TABLE]
I am not a originator of the code, but i would like to learn from code and know how to make it works. the codes are as follows:
I appreciate any help that i could have. Thank you in advance!
Thanks in advance,
Wye
I got the codes from online resources, when i run it to do k-means clustering, it seems don't work as it should be (it should be a 6 columns x 4 rows table, but it only returns 5 columns x 3 rows table).
I have data in datasheet (worksheet) in 6 columns x 33 rows table.
Then, i have initial 4 centroids coordinates in activesheet G3:L6.
I want to return the results (new calculated centroid coordinates paste at range G7 as i indicated in the worksheet) by calculating from the datasheet data range (as i indicated int the worksheet too). Below are my parameter in the same activesheet to run the k-means clustering
[TABLE="width: 238"]
<colgroup><col><col></colgroup><tbody>[TR]
[TD]MaxIt[/TD]
[TD="align: right"]150[/TD]
[/TR]
[TR]
[TD]Data Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Data Range[/TD]
[TD]J6:O38[/TD]
[/TR]
[TR]
[TD]Output Sheet[/TD]
[TD]DataSheet[/TD]
[/TR]
[TR]
[TD]Output Range[/TD]
[TD]P6[/TD]
[/TR]
[TR]
[TD]Initial Centr[/TD]
[TD]G3:L6[/TD]
[/TR]
[TR]
[TD]Output Centr[/TD]
[TD]G7[/TD]
[/TR]
</tbody>[/TABLE]
I am not a originator of the code, but i would like to learn from code and know how to make it works. the codes are as follows:
Rich (BB code):
Rich (BB code):
Sub kmeans()
'x main macro x'
'x get the initial centroid data
Dim InitCentrRange As String: InitCentrRange = ActiveSheet.Range("C7").Value
Dim InitialCentroids As Variant: InitialCentroids = ActiveSheet.Range(InitCentrRange)
'x get the number of max iterations
Dim MaxIt As Integer: MaxIt = ActiveSheet.Range("C2").Value
'x get the features data
Dim DataSht As String: DataSht = ActiveSheet.Range("C3").Value
Dim DataRange As String: DataRange = ActiveSheet.Range("C4").Value
Dim X As Variant: X = Worksheets(DataSht).Range(DataRange)
'x assign objects to centroids based on initial centroids
Dim my_idx As Variant: my_idx = FindClosestCentroid(X, InitialCentroids)
Dim J As Integer: J = UBound(X, 2) 'number of features
Dim K As Integer: K = UBound(InitialCentroids, 1) 'number of clusters
Dim M As Integer: M = UBound(X, 1) 'number of objects
Dim centroids As Variant
Dim ii As Integer: ii = 1
For ii = 1 To MaxIt
centroids = ComputeCentroids(X, my_idx, K)
my_idx = FindClosestCentroid(X, centroids)
Next ii
'x print the centroids
Dim outputRange As String: outputRange = ActiveSheet.Range("C8").Value
Range(outputRange).Resize(K, J).Value = centroids
'x print the cluster assignment next to data
Dim ClusterOutputSht As String: ClusterOutputSht = ActiveSheet.Range("C5").Value
Dim ClusterOutputRange As String: ClusterOutputRange = ActiveSheet.Range("C6").Value
Worksheets(ClusterOutputSht).Range(ClusterOutputRange).Resize(M, 1).Value = WorksheetFunction.Transpose(my_idx)
End Sub
Rich (BB code):
Public Function FindClosestCentroid(ByRef X As Variant, ByRef centroids As Variant) As Variant
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function loops through all observations and calculate the distance to all centroids x'
'x Function then returns an array with all the nearest centroids for each observation x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
Dim K As Integer: K = UBound(centroids, 1)
Dim J As Integer: J = UBound(centroids, 2)
Dim M As Integer: M = UBound(X, 1)
Dim idx() As Variant: ReDim idx(M) As Variant
Dim ii As Integer: ii = 1
Dim cc As Integer: cc = 1
'For eahc observation find the closest centroid
For ii = 1 To M
Dim Dist_min As Double: Dist_min = 1000000
Dim Dist As Double: Dist = 0
'for each K calculate the distance
For cc = 1 To K
Dist = EuclideanDistance(Application.Index(X, ii, 0), Application.Index(centroids, cc, 0), J)
If Dist < Dist_min Then
idx(ii) = cc
Dist_min = Dist
End If
Next cc
Next ii
FindClosestCentroid = idx()
End Function
Rich (BB code):
Public Function ComputeCentroids(X As Variant, idx As Variant, K As Variant) As Variant
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function that calculates the new centroid mean for each feature x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
Dim M As Integer: M = UBound(X, 1) 'number of objects
Dim J As Integer: J = UBound(X, 2) 'number of features
Dim ii As Integer: ii = 1
Dim cc As Integer: cc = 1
Dim bb As Integer: bb = 1
Dim counter As Integer: counter = 0
Dim centroids() As Variant: ReDim centroids(K, J) As Variant
Dim tempSum() As Variant
For ii = 1 To K 'for each centroid
For bb = 1 To J 'for each feature
counter = 0 'reset counter
For cc = 1 To M 'for each observation
If idx(cc) = ii Then 'for objects that below to this centroid calc the sum
centroids(ii, bb) = centroids(ii, bb) + X(cc, bb)
counter = counter + 1
End If
Next cc
If counter > 0 Then 'modified
centroids(ii, bb) = centroids(ii, bb) / counter 'divide sum by counter to get mean
Else 'modified
centroids(ii, bb) = 0 'modified
End If 'modified
Next bb
Next ii
ComputeCentroids = centroids
End Function
Rich (BB code):
Public Function EuclideanDistance(X As Variant, Y As Variant, num_obs As Integer) As Double
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
'x Function that calculates the euclidean distance between two vectors of size num_obs each x'
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
Dim ii As Integer: ii = 1
Dim RunningSumSqr As Double: RunningSumSqr = 0
For ii = 1 To num_obs
RunningSumSqr = RunningSumSqr + (X(ii) - Y(ii)) ^ 2
Next ii
EuclideanDistance = Sqr(RunningSumSqr)
End Function
I appreciate any help that i could have. Thank you in advance!
Thanks in advance,
Wye