Antonio Cassano, Richard L. Marchese Robinson, Anna Palczewska, Tomasz Puzyn, Agnieszka Gajewicz, Lang Tran, Serena Manganelli and Mark T.D. Cronin

Cassano SI

Cassano source code

Nanotechnology is one of the most important technological developments of the 21st century. In silico methods to predict toxicity, such as quantitative structure–activity relationships (QSARs), promote the safe-by-design approach for the development of new materials, including nanomaterials. In this study, a set of cytotoxicity experimental data corresponding to 19 data points for silica nanomaterials were
investigated, to compare the widely employed CORAL and Random Forest approaches in terms of their usefulness for developing so-called ‘nano-QSAR’ models. ‘External’ leave-one-out cross-validation (LOO) analysis was performed, to validate the two different approaches. An analysis of variable importance measures and signed feature contributions for both algorithms was undertaken, in order to interpret the models developed. CORAL showed a more pronounced difference between the average coefficient of determination (R2) for training and for LOO (0.83 and 0.65 for training and LOO, respectively), compared to Random Forest (0.87 and 0.78 without bootstrap sampling, 0.90 and 0.78 with bootstrap sampling), which may be due to overfitting. With regard to the physicochemical properties of the nanomaterials, the aspect ratio and zeta potential were found to be the two most important variables for Random Forest, and the average feature contributions calculated for the corresponding descriptors were consistent with the clear trends observed in the data set: less negative zeta potential values and lower aspect ratio values were associated with higher cytotoxicity. In contrast, CORAL failed to capture these trends.

This article is currently only available in full to paid subscribers. Click here to subscribe, or you will need to log in/register to buy and download this article