RClusterNumberGap()

Calls the function "clusGap" of a loaded model and returns the goodness of the clustering measure.

Synopsis

int RClusterNumberGap( const dyn_dyn_float data, bool scale, const string& FUN="kmeans", int nstart = 25, int Kmax=10, int B=500, string method="firstSEmax", int userData = 0);

Parameters Description
data Matrix of values.
scale True .. clustering is performed on scaled values.
FUN Cluster function to be used. This means gap statistic for estimating the number of clusters. In other words this is the optimum number of clusters for kmeans. kmeans is the cluster function to be used.
nStart Number of start tries for the calculation of the "goodness". Default: 25.
Kmax Maximum number of clusters to consider. Default: 10.
B Number of Monte Carlo (“bootstrap”) samples. Monte Carlo (“bootstrap”) samples is a statistical method for sampling. Default . 500. For more information see https://en.wikipedia.org/wiki/Particle_filter
method

Computation method identifier. The default method "firstSEmax" looks for the smallest cluster. For other available methods see:

https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/clusGap.html

userData User data of the function call. The user data variable can be set to an integer value and be used to detect errors when calling R functions. Set the variable to an integer value and when the function is called and an error occurs, the specified integer value is returned.

Return Value

The function returns < 0 .. errr, >= 0 .. "gap" statistic

Description

Calls the function "clusGap" of a loaded model and returns the goodness of clustering measure. What does the goodness of clustering measure mean. It means that the goodness of the clustering measure is computed based on the average dispersion compared to a reference distribution for an increasing number of clusters.

Example

The example returns the number of clusters and the goodness of clustering measure.

#uses "CtrlR"
main()
{
  dyn_float df1 = makeDynFloat(31,31,33,32,34,33,32,35,29,34,38,40,37,38,36,36,36,39,38,40,35,32,34,32,34,29,29,28,31,28,30,34,33,28,31,32,33,33,33,35,36,36,40,38,40,37,40,38,40,38);
  dyn_float df2 = makeDynFloat(401,381,382,392,406,372,361,405,392,399,350,342,346,354,304,345,320,317,356,323,386,406,405,396,400,401,365,400,391,398,362,368,363,373,389,370,406,386,402,367,379,380,406,389,374,379,399,406,377,407);
  dyn_float df3 = makeDynFloat(89,85,90,90,99,88,83,102,81,97,95,98,92,96,78,89,83,89,97,93,97,93,99,91,97,83,76,80,87,80,78,90,86,75,86,85,96,91,95,92,98,98,116,106,107,100,114,111,108,111);
  dyn_float df4 = makeDynFloat(63,97,73,73,75,75,80,93,96,77,86,81,85,83,74,68,73,63,86,60,85,93,90,79,79,68,81,66,65,95,96,71,72,81,73,63,84,75,67,77,73,85,100,95,74,71,97,98,67,63);
  dyn_float df5 = makeDynFloat(4,10,10,-4,8,8,-3,3,8,9,6,7,0,3,10,8,3,9,10,8,9,-3,6,9,1,3,6,-4,0,-2,-2,7,6,3,-2,6,8,8,0,3,-3,2,3,10,6,1,4,4,2,8);
  dyn_float df6 = makeDynFloat(35,32,34,34,38,34,33,36,31,30,37,39,39,34,33,39,30,33,37,35,44,40,41,41,43,44,44,40,42,45,37,30,30,33,33,35,36,39,36,34,34,30,32,30,30,32,31,34,35,38);
  string err_desc;
  int context;
  //Add the Matrix of values
  dyn_dyn_float ddf1;
  dynAppend(ddf1,df1);
  dynAppend(ddf1,df2);
  dynAppend(ddf1,df3);
  dynAppend(ddf1,df4);
  dynAppend(ddf1,df5);
  dynAppend(ddf1,df6);
  //***************************************************************************************************
  /* Function call of RClusterNumberGap -> Calls the function "clusGap" of a loaded model and returns the goodness of the clustering measure.*/
  bool scale2 = TRUE; //Clustering is performed on scaled values
  int j; //Return value of the function -> OK/NOK
  string FUN = "kmeans"; /*cluster function to be used (gap statistic for estimating the number of clusters. This means the optimum number of clusters for the function kmeans.*/
  int nStart; //Number of start tries for the calculation of the goodness
  int kMax; //Maximum number of clusters to consider
  int B; //Number of Monte Carlo (“bootstrap”) samples (statistical method for sampling)
  j = RClusterNumberGap(ddf1, scale2); //Function call
  DebugN("Gab statistics:", j);//the number of clusters
  //***************************************************************************************************
}

Assignment

R Functions

Availability

R Control Extension

See also

R Control Extension CTRL Functions