To model the quantitative relationship of the nanoparticle toxicity we can use theoretical molecular descriptors or physico-chemical characteristics. The former provide an auspicious interpretation of the toxicity mechanisms, however their computation may be very demanding, namely in the nanoscale. The latter are on the other hand fully observable, yet scarcely available for all the toxicity-assessed particles. Currently, there are large initiatives generating data for QNTR, including their toxicity and physico-chemical features. Resulting data are naturally very heterogeneous because of multiple subjects involved in the project. In this study, we investigate whether the data generated from such large projects are sufficient to induce well-generalizing models. We used the data generated by MODENA-COST, consisting of the toxicity measurements and physico-chemical characteristics of 192 nanoparticles. We build several machine-learning based models and focused on their statistical validity. The internal evaluation of these models (i.e. protocol using the same data set, such as cross-validation) suggests quite good validity of these models. Then we employed a rigorous validation protocol and external data set of our own measurements related to 10 standardized MeOx nanoparticles. Hence, the result were not so optimistic at all. Instead, they seem valid only for a well-defined set of experimental conditions. This research is supported by the Czech Ministry of Education, Youth and Sports (Grants No. LD 14002 and LO1508).Keywords: Nano-QSAR, machine learning, nanoparticle characteristics
© This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.