Better efficacy in differentiating WHO grade II from III oligodendrogliomas with machine-learning than radiologist’s reading from conventional T1 contrast-enhanced and fluid attenuated inversion recovery images

Background The medical imaging to differentiate World Health Organization (WHO) grade II (ODG2) from III (ODG3) oligodendrogliomas still remains a challenge. We investigated whether combination of machine leaning with radiomics from conventional T1 contrast-enhanced (T1 CE) and fluid attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) offered superior efficacy. Methods Thirty-six patients with histologically confirmed ODGs underwent T1 CE and 33 of them underwent FLAIR MR examination before any intervention from January 2015 to July 2017 were retrospectively recruited in the current study. The volume of interest (VOI) covering the whole tumor enhancement were manually drawn on the T1 CE and FLAIR slice by slice using ITK-SNAP and a total of 1072 features were extracted from the VOI using 3-D slicer software. Random forest (RF) algorithm was applied to differentiate ODG2 from ODG3 and the efficacy was tested with 5-fold cross validation. The diagnostic efficacy of radiomics-based machine learning and radiologist’s assessment were also compared. Results Nineteen ODG2 and 17 ODG3 were included in this study and ODG3 tended to present with prominent necrosis and nodular/ring-like enhancement (P < 0.05). The AUC, ACC, sensitivity, and specificity of radiomics were 0.798, 0.735, 0.672, 0.789 for T1 CE, 0.774, 0.689, 0.700, 0.683 for FLAIR, as well as 0.861, 0.781, 0.778, 0.783 for the combination, respectively. The AUCs of radiologists 1, 2 and 3 were 0.700, 0.687, and 0.714, respectively. The efficacy of machine learning based on radiomics was superior to the radiologists’ assessment. Conclusions Machine-learning based on radiomics of T1 CE and FLAIR offered superior efficacy to that of radiologists in differentiating ODG2 from ODG3.

Calcification [4,5] and the cortical-subcortical location [5,6], most commonly in the frontal lobe [4], are regarded as the characteristic features of ODGs. In contrast to other low-grade gliomas (LGG), minimal to moderate enhancement and moderately increased perfusion are commonly seen in ODGs, making the differentiation of OGD2 from OGD3 difficult. Besides, ODG3 often shares the imaging features with ODG2 on conventional MRI, leading to unreliable tumor grade prediction. Edema, haemorrhage, cystic degeneration and contrast enhancement are more commonly seen in ODG3, but may also be seen in ODG2 [4]. Thus, a new medical imaging diagnostic strategy for differentiation of ODG2 from ODG3 needs to be developed.
Advanced imaging techniques, including DWI, perfusion imaging, MR spectroscopy and PET, are employed to obtain more sensitive diagnostic markers, however with unsatisfying efficacy. Diffusion restriction is seldom observed in ODG2 [6]. Averaged ADC values are reported to be lower in high grade glioma (HGG) than in LGG, however, ADC values of ODG3 are overlapped with that of ODG2, making DWI unreliable maker to distinguish them [7]. Using the cut-off value of 1.75 for relative cerebral blood volume (rCBV) ratio, HGG can be differentiated from LGG with a sensitivity of 95% [8]. Unfortunately, these findings may not be suitable for differentiating ODGs, because markedly elevated rCBV can also be observed in ODG2, thus, a reliable distinction can't be easily achieved [7,9,10]. This is due to the presence of the short capillary segments in ODGs [5] which may contribute to the relatively low specificity (70%) reported by Law et al. [8]. Therefore, focally elevated rCBV does not necessarily indicate ODG3. Besides, correlation of K trans with tumor grade is even poorer than that of rCBV, and it is more commonly used to assess the treatment effects [11]. Taking together, the efficacies of advanced MRI techniques in differentiating ODG2 from ODG3 are limited.
Combining quantitative image features extracted from conventional T1-weighted contrast-enhanced (T1 CE) and fluid attenuated inversion recovery (FLAIR) images with machine learning algorithms, radiomics can provide comprehensive information that is difficult to perceive with visual inspection [12,13] and is commonly used in tumor diagnosis, staging and prognosis of tumors [14][15][16][17][18][19][20]. However, most previous studies were mainly focused on advanced MR techniques, the varied post-processing models, varied interpretation and evaluation criteria restricted their clinical applications. Except for their limited diagnostic powers, these advanced MRI techniques are not commonly available in some rural areas. However, the T1 CE and FLAIR are widely-used in almost all hospitals as the image routine sequences for glioma diagnosis and staging. It is thus feasible to combine radiomics with T1 CE and FLAIR to establish a practical and economical imaging solution for differentiating ODG2 from ODG3.
In this study, we aimed to evaluate the diagnostic power of machine-learning based on T1 CE and FLAIR imaging radiomics in comparison with the radiologists' performance in differentiating ODG2 from ODG3.

Patients
This study was approved by our institutional review board and the requirement for informed consent was waived based on its retrospective nature. From January 2015 to July 2017, patients with confirmed ODGs were retrospectively and consecutively recruited. Tumors were classified according to 2007 WHO classification or 2016 WHO guidelines when enough information was available. The including criteria were, 1. patients underwent preoperative conventional MRI scan. 2. patients underwent gross total or subtotal tumor resection and a confirmative pathological diagnosis was made. Thirty-six patients with T1CE were included (19 men, 17 women; mean age = 45 years; age range = 9-65 years) and classified into two groups: ODG2 (n = 19; mean age = 46 years, age range = 10-65 years) and ODG3 (n = 17; mean age = 44 years, age range = 9-65 years). Thirty-three out of the above 36 patients with FLAIR were enrolled (18 men, 15 women; mean age = 45 years; age range = 9-65 years) and classified into two groups: ODG2 (n = 17; mean age = 45 years, age range = 10-65 years) and ODG3 (n = 16; mean age = 45 years, age range = 9-65 years). The patient selection is summarized in Fig. 1.

MRI data acquisition
All patients underwent 3-T MR scanning (Discovery MR750, General Electric Medical System, Milwaukee, WI, USA) with an 8-channel head coil (General Electric Medical System). The initial routine scan sequences for each patient included T1-weighted imaging (T1WI) performed before and after contrast enhancement, an axial T2-weighted imaging (T2WI), and a transverse FLAIR to assist with diagnosis.
The parameters of the conventional MRI sequences were as the follows: T1WI with gradient echo (

Tumor segmentation or delineation
Two neuroradiologists (S.S.Z with 8 years of experience and L.F.Y, with 12 years of experience in neuro-oncology imaging) independently reviewed all images. A third senior neuroradiologist (G.B.C, with 25 years of experience in euro-oncology imaging) re-examined the images and determined the final imaging diagnoses when inconsistency occurred. The preoperative conventional image features of tumor were retrieved based on the criteria outlined in Additional file 1: Table S1 (online).
The volumes of interest (VOIs) were semi-automatically segmented using ITK-SNAP (version3.6, http://www.itksnap.org) by two neuroradiologists (S.S. Z and L.F.Y). The VOIs covering the enhanced lesion were drawn slice by slice on T1 CE and co-registered to and FLAIR images, avoiding the regions of macroscopic necrosis, cyst, edema and non-tumor macrovessels [21].

Feature extraction
Texture features include 162 first-order logic features, 216 Gy level co-occurrence matrix (GLCM) features, 144 Gy level run length matrix (GLRLM) features, 144 Gy level size zone matrix (GLSZM) features, 126 grey level difference matrix (GLDM) features, 45 neighborhood grey-tone difference matrix (NGTDM) features and 14 shape Features. A total of 1072 features were extracted from the T1 CE and FLAIR images using 3Dslicer software. We used the aforementioned features because these features were found to be relevant for distinguishing ODG2 from ODG3 in our previous studies by using MR imaging [16].

Feature selection
After being centered and scaled, the highly redundant and correlated features were subjected to a two-step feature selection procedure. First, highly correlated features were eliminated using Pearson correlation analysis, with the r threshold of 0.75. Then, a random forest (RF) classifier consisting of a number of decision trees was used to rank the feature importance. Every node in the decision trees is a condition on a single feature, designed to split the dataset into two so that similar response values end up in the same set. The measurement based on which optimal condition is chosen is called impurity. For classification, it is typically either Gini impurity or information gain/entropy. Thus, when training a tree, it can be computed how much each feature decreases the weighted impurity in a tree. To build the RF, the impurity decrease from each feature can be averaged and the features are ranked according to this measurement. In our study, Gini impurity decrease was used as the criterion to indicate the feature importance.

Radiomics model building
The 30 most important features were fed into a Conditional Inference RF classifier to build model [22]. Five-fold cross validation was employed for tuning hyper-parameter number of RF trees. Five-fold cross validation including pre-processing, feature selection and model construction were performed 3 times in order to avoid bias and overfitting as much as possible. The final results were the average from 3 performances. There was no feature selection in the combination of T1 CE and FLAIR throughout the model building. Accuracy, sensitivity and specificity were Fig. 2 The main procedure of the radiomic strategy for preoperative ODGs grading. Based on T1 CE and FLAIR data (a) and tumor volume of interest (VOI) manually drawn on resampled T1 CE and FLAIR images (b), a group of parametric images are derived and the corresponding parametric maps of the whole tumor region are extracted (c). Utilizing radiomic features analysis; a big collection of tumor parameter attributes was acquired for the following machine learning process (d). Feature selection methods were implemented and compared using random forest (RF) classifier with additional discussion on model parameters to construct the optimal ODG grading model (e) computed to evaluate the classifying performance. The receiver operating characteristic (ROC) curve was also built to provide the area under the ROC curve (AUC). The larger the AUC, the better the classification [23]. The whole procedure of feature extraction and machine learning was described in Fig. 2.

Radiologist's assessment
To compare the efficacies of neuroradiologist and machine learning in differentiating ODG2 from ODG3, the images were also independently assess by three junior neuroradiologists (X.L.F, G. X and Y. H with 6, 7 and 7 years of neuroradiology experience, respectively). The neuroradiologists were blinded to the clinical information, but were aware that the tumors were either ODG2 or ODG3, without knowing the exact number of patients with each entity. The three readers assessed only conventional MR images (T1WI, T2WI, FLAIR and T1 CE), and recorded the final diagnosis using a 4-point scale (1 = definite ODG2; 2 = likely ODG2; 3 = likely ODG3; and 4 = definite ODG3) [24].

Statistical analysis
Fisher exact test or the Chi-square test were used for the categorical variables and unpaired Student t test was used for continuous variable between ODG2 and ODG3 groups. The statistical analyses of clinical characteristics were performed by using SPSS 20.0 software (SPSS Inc., Chicago, IL, USA). The statistical analyses of machine-learning were performed using R version 3. 4. 2 (R Foundation for Statistical Computing). A RF analysis was performed to train the machine-learning classifier. The goal of machine learning was to build the model to differentiate ODG2 from ODG3 based on radiomics features of T1 CE and FLAIR images. The following R packages were used: the random forest package was used for feature ranking; the caret and unbalanced packages were used for RF classification. Classifier performance was determined by using accuracy, sensitivity and specificity. The AUC values were also calculated for three readers and compared with that of the radiomics classifier. P value < 0.05 was considered as statistical significance.

Patient characteristics
The main clinical characteristics and conventional MRI features of the 36 patients (ODG2 and ODG3) were summarized in Table 1. Tumor necrosis was more frequent in ODG3 than in ODG2 groups (P = 0.044), reflecting the hypoxia as a result of the rapid tumor growth. In addition, ODG3 were related to the nodular/ring-like enhancement patterns (P = 0.002). Besides, 10/19 (52.6%) of ODG2 and 10/17 (58.8%) of ODG3 situated in the frontal lobe, indicating no significant group difference. No significant difference of other clinical characteristics (gender, age) or imaging paradigms was observed between ODG2 and ODG3 patients.

Quantitative MR histogram and texture features analysis
The relative importance of features computed by using the Gini index to differentiate ODG2 from ODG3 was depicted in Fig. 3. It can be seen that if all the highthroughput features were put into the RF classifiers, the classification performance could not be significantly improved because of the feature redundancy. The strong relationship between radiomic features to differentiate ODG2 from ODG3 was also indicated in the radiomic heat map (Fig. 4). The RF based feature selection strategy improved the performance of RF classifier. After RF feature selection, 30 optimal features were selected to differentiate ODG2 from ODG3, with comparable efficacy to that of using all features.

Evaluation of principal components
When ODG2 and ODG3 were differentiated by using principal components, similar tumor tissue formed characteristic clusters. These clusters, although heterogeneous, defined a specific VOI (eg, Fig. 5) and were separable from other tumors (clusters). More important, the calculated principal components of the VOIs from ODG2 and ODG3 allowed clear separation of these two important regions.

Diagnostic performance of radiomics and radiologists
The performance of radiomics and 3 radiologists in differentiating ODG2 from ODG3 was also compared. Table 2 and Fig. 6 summarized the diagnostic performance of the radiomic features derived by using MR images from T1 CE, FLAIR and their combination to distinguish ODG2 from ODG3. Radiomic features from their combination showed significantly better diagnostic performance than that of FLAIR or T1 CE. Violin plots graphed for the first 9 radiomic features derived from T1 CE, FLAIR and their combination were presented in

Discussion
Radiomics is an emerging field that treats images as data rather than pictures and analyzes a large number of features extracted from 1 image in relation to clinical variables of interest. A few studies on radiomics analyses of glioma have been published over the last years and advocated for machine learning models in predicting tumor histology and grade [25]. Radiomics has been suggested as a robust strategy to noninvasively classify lesions [14,26]. This work suggested that radiomics from T1 CE and FLAIR can be useful for differentiating ODG2 from ODG3, with the superior efficacy to that of radiologists, thus, its clinical application could be justified based on the current study.
From the angle of experiment design, there are three aspects worthy noting in this study. First, the 'real world' data were used to test our scientific hypothesis. Second, all images analyzed in the current study were taken exclusively from routine clinical diagnostic scans. Third, based on the social-economic consideration, the levels of accuracy were based on the radiomics of commonly available T1 CE and FLAIR images, without an acquisition of spectroscopy, CBV or perfusion information, all of which would prolong the scanning time and increase economic burden to patients. Upon our expectation, the radiomic strategy performed superior to that of radiologists.
The reasons for the improved diagnostic performance of radiomics are as the following. First, radiomic methods, given their ability to discern patterns and combine information in a way that humans cannot, showed substantial promise for the future of radiology and precision medicine [27]. However, radiologists distinguished ODG2 from ODG3 by visual diagnosis using rough information from T1 CE and FLAIR. Second, it has been reported that the performance of an SVM classifier can be significantly reduced by the inclusion of redundant features and this effect is more obvious for a small training set [28]. In this study, it was found that the combination of conventional T1 CE and FLAIR features provided lower classification error than features of individual sequence, which may thus emphasize the importance of using a multiparametric approach. In addition, highly correlated features were eliminated using Pearson correlation analysis, which was also further ranked by using the random forest classifier consisting of a number of decision trees. This indicated that redundant features  Radiomic strategy not only performed superior to radiologists, but also could be used as an auxiliary means to overcome some problems attained to radiologists. First of all, the frequency of interruptions during a reporting session is associated with up to 13% increase in time for reporting and an increased potential for errors [29]. Then, fatigue adversely impacts the visual system including: worse accommodation, decreased saccadic velocity and reduced gaze volume and coverage [30]. At last, a number of cognitive biases may adversely affect the accuracy of a radiologists report of a glioma [31]. In order to reduce reporting time and cognitive biases, both of which may lead to reporting and diagnostic errors, radiomics offers a significant advantage [32], particularly in the context of general radiologists who may lack expertise in neuro-oncology. Nevertheless, the current radiomic strategy involves too much pre-and postprocess before the suitable machine learning model is established, more studies focusing on the efficacy-cost balance of such a machine learning system should be further conducted before its clinical application.
Furthermore, a few limitations of this study should be noticed. In the first place, sample number of the patients is relatively small. Although current results of 5-fold cross validation showed that the evaluation of diagnostic efficacy were robust despite the relatively small sample size, which did not cause the classifier to be skewed towards a particular class. It is desirable to verify the classifier on a larger data size in the future. Besides, this radiomic method incorporated vessel removal in its methodology, this method may fail for certain cases that were non-tumor vessels intertwined with tumor vessels.  Signal intensity curves of prominent vessels can be used as a differentiating feature for such cases.. The last, a continuous effort on enlarging the dataset so as to test its external validation is required.

Conclusions
In conclusion, this study demonstrates our findings that use of a machine learning algorithm, derived from 'real word' T1 CE and FLAIR images, which can differentiate ODG2 from ODG3 in newly diagnosed gliomas with a superior efficacy to that of radiologists. The RF selected features can reduce the labor in applying this strategy, and the strategy can be applied clinic based on our findings.