VIRTUALMAN | Training strategy for unbalanced small datasets in deep

News & Events News

Training strategy for unbalanced small datasets in deep learning

Research 21, August, 2020

Author(s):

Hsin Liu, Cheng-Wei Lee, Bo-Han Su, and Yufeng J. Tseng

Abstract:

Big datasets have been keys to deep learning and the neural network approach applied to them in the past few years. However, one never has the luxury in medicinal chemistry compared to image processing field where large accessible data were readily available. The smaller datasets are often due to the lack of published experimental results which might be affected by including complex experimental design, expensive experimentation, or simply limitations in techniques. Also, the nature of medicinal chemistry chasing after more active compounds make almost published data unbalanced—that is having few positive data with mostly negative data. It would be invaluable to be able to train a model with unbalanced small dataset in medicinal chemistry for drug development in particular. In this work, we proposed a training strategy for unbalanced small datasets. The strategy includes selecting the sampling ratio, core deep learning methods, fingerprint selection, and descriptor merge of fingerprint and automatic feature extraction by deep learning. We chose the Ames test for mutagenicity as the example in this study due to its available information for validation study; and also the entire dataset could be divided in segments to simulate unbalanced small datasets for training and discussion. Overall, the up-sampling method is able to rebalance the data distribution in different categories and demonstrates better performance in both convergence speed and balanced accuracy.

References:

1. David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan ́ Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (NIPS), pp. 2224–2232, 2015.

Download PDF file:

Experience the benefits of AI technology, contact our consultants for more information!

News & Events News

Training strategy for unbalanced small datasets in deep learning

Recommended News

IN THE NEWS- VIRTUALMAN Leverages HPE GPU Servers to Power Core Computing and Lead a New Wave of AI-Driven Drug Discovery - HPE testimony

IN THE NEWS - AI-Powered Drug Discovery: Startup Investment and Future Outlook -DCB

Upcoming events

The New Era of AI-Driven Drug Discovery Symposium

AI-Powered Drug Discovery: Startup Investment and Future Outlook

Generative AI in Drug Development and Precision Medicine Forum

ACS Fall 2022

BIO Asia Taiwan 2022

ACS Spring 2022

BIO Asia Taiwan 2021

FORUM - New applications in drug discovery and development

CONFERENCE - How to commercialize AI in pharmaceuticals industry

Biotech–MedTech

TechCrunch Disrupt 2020

Seminar - Using AI to accelerates the drug development of small molecular drug

ACS Fall 2020 Moving Chemistry From Bench to Market

BIO International Convention