mercredi 30 avril 2014

Use of "seeds" option from train() function in caret


Vote count:

0




I have a question regarding the seeds options in the train() fonction from the caret package. This option is supposed to ensure that the samples used by different process in a parallelized cross-validation are consistent across the workers.


Here is an example of the creation of a seeds argument :



#create a list of seed, here change the seed for each resampling
set.seed(123)
seeds <- vector(mode = "list", length = 11)#length is = (n_repeats*nresampling)+1
for(i in 1:10) seeds[[i]]<- sample.int(n=1000, 3) #(3 is the number of tuning parameter, mtry for rf, here equal to ncol(iris)-2)

seeds[[11]]<-sample.int(1000, 1)#for the last model

#control list
myControl <- trainControl(method='cv', seeds=seeds, index=createFolds(iris$Species))


I can't get why so many arguments (10*3+1) are necessary to setup the folds. For me, the same folds are used by the models which evaluate each parameters and I really don't understand the need of the 14th parameter. How all these parameters are used by the workers ? I couldn't find any information in the documentation.


Thank you.



asked 53 secs ago

Alex

11





Aucun commentaire:

Enregistrer un commentaire