RRandomForest_Train()

Trains a RandomForest model and returns an error rate value, a confusion matrix and an importance matrix.

Synopsis

int RRandomForest_Train( const string& rModelVar, const dyn_string& headers, const dyn_dyn_float& vals, const dyn_float& la-bels, bool classification, int ntree, int mtry, float& err_rate, dyn_dyn_float& confusion, dyn_dyn_float& importance, int userData = 0);

Parameters

Parameter Description
rModelVar Name of R variable containing an RF model
headers Array of header strings
vals Matrix of values
labels Array of cluster labels
classification true .. classification, false .. regression. Classification identifies to which set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
ntree Tree count. Random Forest is a set of decision trees. nTree specifies the number of trees to grow. This should not be set to too small number to ensure that every input row gets predicted at least a few times.
mtry Number of tries meaning number of variables randomly sampled as candidates at each split. Random Forest is a set of decision trees. The split refers to a branching in a decision tree.
rate Return parameter of error rate
confusion Return parameter of confusion matrix
importance Return parameter of importance matrix
userData User data of the function call. The user data variable can be set to an integer value and be used to detect errors when calling R functions. Set the variable to an integer value and when the function is called and an error occurs, the specified integer value is returned.

Return Value

The function returns 0 if it was successfully executed.

Description

Trains a RandomForest model and returns an error rate value, a confusion matrix and an importance matrix. For more information on random forest models, see https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Example

The example loads the model "D:/Test/myNewModel2.RData"; First the model is created via the RRandomForest_Train function. The function REvalExp evaluates an R expression to save the model in an output file. The model that was saved as a file can then be loaded via the RLoadModel function.

#uses "CtrlR"
main()
{
  bool classification = TRUE;//Classification
  int ntree = 5;
  int mtry = 5;
  float err_rate;
  dyn_dyn_float confusion; //Return parameter of confusion matrix - see chapter Classification Wizard - Quality
  dyn_dyn_float importance; //Return parameter of importance matrix - see chapter Classification Wizard - Quality
  //Add data 
  dyn_float df1 = makeDynFloat(31,31,33,32,34,35,36); /* The size of the arrays must be identical. Each 7 items*/
  dyn_float df2 = makeDynFloat(401,381,382,392,406,410,408); /* The size of the arrays must correspond to the header size */
  dyn_float df3 = makeDynFloat(89,85,90,90,99,98,97);
  dyn_float df4 = makeDynFloat(63,68,71,73,200,300,350);
  dyn_float df5 = makeDynFloat(4,10,10,-4,8,7,8);
  dyn_float labels = makeDynFloat(0,0,1,1,2,3,3); /*The number of labels must correspond to the number of array entries */
  dyn_string headers = makeDynString ("Current", "Voltage", "Load","T_Increase","T_Ambient");
  /* The size of the headers must correspond to the number of array entries df1, df2.. */
  string H_LINE = "***************************************************************************";
  dyn_dyn_float ddf1;
  dynAppend(ddf1,df1);
  dynAppend(ddf1,df2);
  dynAppend(ddf1,df3);
  dynAppend(ddf1,df4);
  dynAppend(ddf1,df5);
  string rModelVar = "myModel2"; //r model variable
  string err_desc; //Description variable for error handling
  int UserData; /* See the description of the userdata. This is the return parameter for the RGetLastErr - see below */
  //Create a model via the function "RRandomForest_Train
  int retV = RRandomForest_Train(rModelVar, headers, ddf1, labels, classification, ntree, mtry, err_rate, confusion, importance);
  string out = "D:/Test/myNewModel2.RData"; //The concrete file for the model
  REvalExp(0, "save(%var%, file = %var%)", rModelVar, out);
  /*Evaluates the expression and saves the rModelVar in the out parameter */
  DebugN("Value of the out parameter:", out);
  string ModelName;
  int h = RLoadModel(out, ModelName); /* Load the model. The function returns the name of the model. ModelName contains the name of the model */
  DebugN("ModelName:",ModelName, "loaded:", h);
  DebugN("Error rate:", err_rate, "Confusion matrix:", confusion, "Importance matrix:", importance);
  //For information on confusion and importance matrices, see chapter Classification Wizard - Quality
  blob at = RGetVarSerialized(rModelVar); //Retrieves the deserialized value of the R variable rModelVar.
  if (RGetLastErr(err_desc, UserData, true) != 0) //Error handling
  {
    DebugN("Error occurred: " + err_desc); 
    return;
  } 
  DebugN("Rmodelvar:", at);
  DebugN(H_LINE);
  int row_to_predict = 5;
  int row_len = 5;
  dyn_float values;
  for(int i = 1; i <= row_len; i++)
  {
    int ret = dynAppend(values, ddf1[i][row_to_predict]); /*Add data to "values"-> Prepare values for the prediction function */
    if( ret == -1 )
    {
      DebugN("Error! dynAppend to labels failed!");
      return; 
    } 
  }
  int prediction = RPredict("myModel2", headers, values); /* Calls the Calls the function "predict" of a loaded model and returns the prediction result.*/
  //error handling
  if (RGetLastErr(err_desc, UserData, true) != 0)
  {
    DebugTN("Error occurred: " + err_desc); 
    return;
  }
  DebugN("prediction=" + prediction);
  DebugN("RPredict_test finished!");
  DebugN(H_LINE);
}

Assignment

R Functions

Availability

R Control Extension

See also

RLoadModel()