Antonio Cassano, Richard L. Marchese Robinson, Anna Palczewska, Tomasz Puzyn, Agnieszka Gajewicz, Lang Tran, Serena Manganelli and Mark T.D. Cronin
investigated, to compare the widely employed CORAL and Random Forest approaches in terms of their usefulness for developing so-called ‘nano-QSAR’ models. ‘External’ leave-one-out cross-validation (LOO) analysis was performed, to validate the two different approaches. An analysis of variable importance measures and signed feature contributions for both algorithms was undertaken, in order to interpret the models developed. CORAL showed a more pronounced difference between the average coefficient of determination (R2) for training and for LOO (0.83 and 0.65 for training and LOO, respectively), compared to Random Forest (0.87 and 0.78 without bootstrap sampling, 0.90 and 0.78 with bootstrap sampling), which may be due to overfitting. With regard to the physicochemical properties of the nanomaterials, the aspect ratio and zeta potential were found to be the two most important variables for Random Forest, and the average feature contributions calculated for the corresponding descriptors were consistent with the clear trends observed in the data set: less negative zeta potential values and lower aspect ratio values were associated with higher cytotoxicity. In contrast, CORAL failed to capture these trends.