Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients

Table 1 Summary of the values of the hyperparameters for the best RF, SGB, NB and XGBoost models

Methods	Hyperparameters	Best Value	Meaning
RF	mtry	8	The number of random features used in each tree
RF	ntree	500	The number of trees in forest
XGBoost	nrounds	100	The number of tree model iterations
	max_depth	3	The maximum depth of a tree
	eta	0.4	Shrinkage coefficient of tree
	gamma	0	The minimum loss reduction
	subsample	0.75	Subsample ratio of columns when building each tree
	colsample_bytree	0.8	Subsample ratio of columns when constructing each tree
	rate_drop	0.5	Rate of trees dropped
	skip_drop	0.05	Probability of skipping the dropout procedure during a boosting iteration
	min_child_weight	1	The minimum sum of instance weight
NB	fL	0	Adjustment of Laplace smoother
	usekernel	TRUE	Using kernel density estimate for continuous variable versus a Gaussian density estimate
	adjust	1	Adjust the bandwidth of the kernel density
SGB	n.trees	50	The number of tree model iterations
	interaction.depth	1	The iterations depth of a tree
	shrinkage	0.1	Subsample ratio of columns when building each tree
	n.minobsinnode	10	The minimum number of instances per leaf Node

RF Random forest, SGB Stochastic gradient boosting, NB Naïve Byer’s classifier, XGBoost eXtreme gradient boosting

ISSN: 1471-2377