Comparable efficacies in differentiating WHO grade II from III oligodendrogliomas with machine-learning and radiologist’s reading

Abstract (236 words) Background: remains a challenge. We investigate whether combination of machine leaning with radiomics from conventional T1 contrast-enhanced (T1CE) magnetic resonance imaging (MRI) can offer improved efficacy. Methods: Thirty-six patients with histologically confirmed ODGs who underwent the T1CE MR examination before any intervention between January 2015 and July 2017 were recruited in the current study. The volume of interest (VOI) covering the whole tumor enhancement were manually drawn on the T1CE slice by slice using ITK-SNAP and a total of 1044 features were extracted from the VOI using Analysis-Kinetics software. Random forest (RF) algorithm and 5-fold cross validation were applied to differentiate ODG2 from ODG3. The diagnostic efficacies of radiomics-based machine learning and radiologist’s assessment were also compared. Results: Nineteen ODG2 and 17 ODG3 were included in the current study and ODG3 tended to present with prominent necrosis and nodular/ring-like enhancement (P < 0.05). The RF strategy with radiomics features produced the stable diagnostic efficiency, with an AUC, ACC, sensitivity, and specificity of 0.765, 0.763, 82.8% and 70.0%, respectively. The AUCs of radiologists 1, 2 and 3 were 0.700, 0.687, and 0.714, respectively. The efficacy of machine learning based on radiomics is comparable to that of radiologist. Conclusions: Machine-learning based on radiomics of T1CE offered comparable efficacy to that of radiologist on differentiating ODG2 from ODG3.

Comparable efficacies in differentiating WHO grade II from III oligodendrogliomas with machine-learning and radiologist's reading Guang-Bin Cui making their differentiation complicated. Besides, ODG3 often has similar imaging features to ODG2 on conventional MRI, leading to no reliable tumor grade prediction. Edema, haemorrhage, cystic degeneration and contrast enhancement are more commonly seen in ODG3, reflecting histopathological findings, but may also be seen in ODG2 [4]. Thus, a new medical imaging diagnostic strategy for differentiation of ODG2 from ODG3 needs to be developed.
Advanced imaging techniques, including DWI, perfusion imaging, MR spectroscopy and PET, are employed to obtain more sensitive and pathophysiological diagnostic markers, however with unsatisfying efficacy. Diffusion restriction is seldom observed in ODG2 [6]. Averaged ADC values are reported to be lower in high grade glioma (HGG) than in low grade glioma (LGG), however, overlapped ADC values with that of ODG2 make DWI not a reliable maker to distinguish ODG2 from ODG3 [7].
Using a relative cerebral blood volume (rCBV) ratio cut-off value of 1.75, HGG can be differentiated from LGG with 95% sensitivity [8]. Unfortunately, these findings do not seem to apply to ODG and oligoastrocytoma, because markedly elevated rCBV can also be observed in ODG2, thus, a reliable distinction can not be consistently achieved [7,9,10]. This is due to the presence of the short capillary segments in ODGs [5] and may contribute to the relatively low specificity (70%) reported by Law et al[8]. (Focally) elevated rCBV therefore does not necessarily indicate ODG3. Besides, correlation of K trans with tumor grade is even poorer than that of rCBV, and it is more commonly used to assess the treatment effects [11]. Taking together, the efficacies of advanced MRI techniques in differentiating ODG2 from ODG3 are limited.
Combining quantitative image features extracted from conventional MRI with machine learning algorithms, radiomics can provide comprehensive information that is difficult to perceive by visual inspection [12, 13] and is commonly used in the diagnosis, staging and prognosis of tumors [14-19].
However, most previous studies were largely focused on advanced MR techniques, the varied postprocessing models, varied interpretation and evaluation criteria restricted their clinical applications.
Besides their limited diagnostic powers, these advanced MRI techniques are not commonly available in some rural areas. However, the T1-weighted contrast-enhanced images (T1CE) is widely-used in almost all hospitals as a routine sequence for glioma diagnosis and staging. It is thus feasible to combine radiomics with T1CE to establish a practical solution for differentiating ODG2 from ODG3.
In this study, we aimed to evaluate the diagnostic power of machine-learning based on T1CE imaging radiomics in differentiating ODG2 from ODG3 in comparison with the performance of radiologists.

Patients
This study was approved by our institutional review board and the requirement for informed consent was waived based on its retrospective nature. Between January 2015 and July 2017, consecutive patients with surgically confirmed ODG2 and ODG3 were retrospectively recruited. Tumors were classified according to 2007 WHO classification or 2016 WHO guidelines when enough information were available. The including criteria were, 1. patients underwent preoperative conventional MRI scan; 2. patients underwent gross total resection or subtotal resection of the lesion and a confirmative pathological diagnosis was made. Thirty-six patients were included (19 men, 17 women; mean age =45 years; age range =9 -65 years) and classified into two groups: ODG2 (n = 19; mean age = 46 years, age range =10 -65 years) and ODG3 (n = 17; mean age = 44 years, age range = 9 -65 years). The patient selection process is summarized in Figure 1.

MRI Data Acquisition
All patients underwent 3-T MR scanning (Discovery MR750, General Electric Medical System, Milwaukee, WI, USA) with an 8-channel head coil (General Electric Medical System). The conventional MR protocol included T1-weighted imaging (T1WI) performed before and after contrast enhancement, an axial T2-weighted imaging (T2WI), and a transverse fluid-attenuated inversion recovery (FLAIR).  were extracted from the T1CE images using Analysis-Kinetics (A.K., GE Healthcare) software. We used the aforementioned features because these features were found to be relevant for distinguishing ODG2 from ODG3 in our previous studies by using MR imaging [16].

Tumor Segmentation or Delineation
Feature selection After being centered and scaled, the highly redundant and correlated features were subjected to a two-step feature selection procedure. First, highly correlated features were eliminated using Pearson correlation analysis, with the r threshold of 0.75. Then, a random forest (RF) classifier consisting of a number of decision trees was used to rank feature importance. Every node in the decision trees is a condition on a single feature, designed to split the dataset into two so that similar response values end up in the same set. The measurement based on which (locally) optimal condition is chosen is called impurity. For classification, it is typically either Gini impurity or information gain/entropy. Thus, when training a tree, it can be computed how much each feature decreases the weighted impurity in a tree. To build the RF, the impurity decrease from each feature can be averaged and the features are ranked according to this measurement. In our study, Gini impurity decrease was used as the criterion to indicate the feature importance.

Radiomics model building
The 30 most important features were fed into a Conditional Inference RF classifier to build model [21]. Five-fold cross validation was employed for tuning hyperparameter number of RF trees. Five-fold cross validation including pre-processing, feature selection and model construction were performed 3 times in order to avoid bias and overfitting as much as possible. The final results were the average from 3 performances. Accuracy, sensitivity, specificity, positive predicting value (PPV), and negative predicting value (NPV) were computed to evaluate the classifying performance. The receiver operating characteristic (ROC) curve was also built to provide the area under the ROC curve (AUC). The larger the AUC, the better the classification [22]. The whole procedure of feature extraction and machine learning was depicted in Figure 2.
Radiologist's assessment To compare the efficacies of neuroradiologist and machine learning in differentiating ODG2 from ODG3, the images were also evaluated by three junior neuroradiologists (X.L.F, G.X and Y.H with 6, 7 and 7 years of experience in neuroradiology, respectively). The neuroradiologists were blinded to clinical information.

Statistical Analysis
Fisher exact test or the Chi-square test were used for the categorical variables and unpaired Student t test was used for continuous variable between ODG2 and ODG3 groups. The statistical analyses of clinical characteristics were performed by using SPSS 20.0 software (SPSS Inc., Chicago, IL, USA).
The statistical analyses of machine-learning were performed using R version 3. 4. 2 (R Foundation for Statistical Computing). A RF analysis was performed to train the machine-learning classifier. The goal of machine learning was to build the model to differentiate ODG2 from ODG3 based on radiomics features of T1CE image. The following R packages were used: the random forest package was used for feature ranking; the caret and unbalanced packages were used for RF classification. Classifier performance was determined by using accuracy, sensitivity and specificity. The AUC values were also calculated for three readers and compared with that of the radiomics classifier. P value < 0.05 was considered as statistical significance.

Patient Characteristics
The main clinical characteristics and conventional MRI features of the 36 patients (ODG2 and ODG3) were summarized in Table 1. Tumor necrosis was significantly frequent in ODG3 than in ODG2 groups (P = 0.044), reflecting the fact of hypoxia as a result of the rapid tumor growth. In addition, ODG3 were related to the nodular/ring-like enhancement patterns (P = 0.002). Besides, 10/19 (52.6%) of ODG2 and 10/17 (58.8%) of ODG3 situated in the frontal lobe, indicating no significant group difference. No significant difference of other clinical characteristics (gender, age) or imaging paradigms was observed between ODG2 and ODG3 patients.

Quantitative MR Histogram and Texture Features Analysis
The relative importance of features computed by using the Gini index to differentiate ODG2 from ODG3 was depicted in Figure 3. as a metric. Second-order texture measures of long run high grey level emphasis _All Direction_offset5_SD, correlation _All Direction_offset9_SD, and short run high grey level emphasis _All Direction_offset6_SD were among features that resulted in the largest decrease in the Gini index. Furthermore, patients with ODG2 had significantly higher long run high grey level emphasis _All Direction_offset5_SD, correlation _All Direction_offset9_SD, and short run high grey level emphasis _All Direction_offset6_SD were observed in ODG2 patients than ODG3 patients ( Table 2). Finally, non-robust features such as the kurtosis computed from T1CE image and Gabor edge images, as well as the minimal intensity computed from T1CE image, were not among the top 10 relevant features.
The relative importance of the features computed by using the Gini index was shown in Table 2. Six out of the 30 features were significantly different between ODG2 and ODG3 patients. It can be seen that if all the high-throughput features were put into the RF classifiers, the classification performance could not be significantly improved because of the feature redundancy.
The strong relationship between radiomics features to differentiate ODG2 from ODG3 was also indicated in the radiomics heat map (Figure 4). The RF based feature selection strategy improved the performance of RF classifier. After RF feature selection, 30 optimal features were selected to differentiate ODG2 from ODG3, with comparable efficacy to that using all features.

Diagnostic Performance of Radiomics and Radiologists
The performance of radiomics and 3 radiologists in differentiating ODG2 from ODG3 was also compared ( Table 3)

Discussion
Digital radiological images are routinely acquired for almost every glioma patient, thus, medical imaging is rapidly becoming a crucial big data source for decoding tumoral phenotype. Radiomics has been suggested as a robust strategy to noninvasively classify lesions [14,23]. This work suggests that radiomics from T1CE may be useful for differentiating ODG2 from ODG3, however, with the comparable efficacy to radiologists, thus, its clinical application could not be justified based on the current study.
From the angle of experiment design, there are three novel aspects in this study. First, the 'real world' data were used to test our scientific hypothesis. Second, all images analyzed in the current study were taken exclusively from routine clinical diagnostic scans. Finally, based on the socialeconomic consideration, the levels of accuracy were based on the radiomics of commonly available T1CE images, without an acquisition of spectroscopy, cerebral blood volume or perfusion information, all of which would increase the scanning time and economic burden to patients. It is somehow out of our expectation that the radiomics strategy did not performed superior to radiologists.
The reason for the not improved diagnostic performance of radiomics might be as the following. First, the information about ODG biology could not be fully reflected on the T1CE image. T1CE was based on the damage of brain blood barrier (BBB) as well as the proliferation of tumor blood vessels.
However, the glioma invasion of normal tissue included the impaired white matter as well as mild alteration in blood perfusion or diffusion, which can be better revealed with advance MRI including DTI, DWI and ASL, etc. Second, T1CE offered some rough information to differentiate ODG2 from ODG3 can be visually inspected by radiologist. Because T1CE reflects the significant BBB damage and tumoral blood vessel proliferation, which are also quite obvious for the visual inspection. Not too much more detailed information than visual inspection was retrieved and contributed to the classifying, thus making the comparable efficacies of radiomics and radiologists.
Even the radiomics strategy could not perform superior to radiologist, there is still potential to apply it as an adjunct to radiologist to overcome some problems attained to radiologists. First, the frequency of interruptions during a reporting session is associated with up to 13% increase in time for reporting and an increased potential for errors [24]. Second, fatigue adversely impacts the visual system including: worse accommodation, decreased saccadic velocity and reduced gaze volume and coverage [25]. Third, a number of cognitive biases may adversely affect the accuracy of a radiologists report of a glioma [26]. In order to reduce reporting time and cognitive biases, both of which may lead to reporting and diagnostic errors, radiomics offers a significant advantage [27], particularly in the context of general radiologists who may lack expertise in neuro-oncology. However, the current radiomics strategy involves too much pre-and post-process before the suitable machine learning model is established, thus, future studies focusing on the efficacy-cost balance of such a machine learning system should be conducted before its clinical application.
A few limitations of this work must be considered. First, the patient number is relative small and the robustness of the current strategy need to be consolidated on a future larger patient population. Second, the cross-sectional data used did not provide the ability to identify markers of survival [28]. A future longitudinal study could provide such data. Third, the validation of the proposed strategy was tested with 5-fold cross validation, instead of independent external dataset. A continuous effort on enlarging the dataset so as to test its external validation is required.

Conclusions
In conclusion, this study demonstrates the use of a machine learning algorithm, derived from 'real word' T1CE images, which can differentiated ODG2 from ODG3 in newly diagnosed gliomas with a comparable efficacy to that of radiologists. The RF selected features in this study may reduce the labor in applying this strategy, however, the clinical application of the proposed strategy can not be justified based on our findings.

Ethics approval and consent to participate
This is a retrospective study that does not require the approval of the ethics committee. (Not applicable)

Consent for publication
Our manuscript does not contain any individual persons data. (Not applicable)

Availability of data and material
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Competing interests
The authors declare that they have no competing interests.

Funding
This study received financial support from the National key research and development program of   Note: Feature relevance was assessed by using mean decrease in Gini index-based feature importance averaged over 100 trials. P values are adjusted for false-discovery rate by using Benjamini-Hochberg method. ODG2 = oligodendroglioma, ODG3 = anaplastic oligodendroglioma, RF = random forest.  Figure 1 Flow diagram of the study design.