Experience the benefits of AI technology, contact our consultants for more information!

News & Events News

Training strategy for unbalanced small datasets in deep learning

Research 21, August, 2020

Author(s):

Hsin Liu, Cheng-Wei Lee, Bo-Han Su, and Yufeng J. Tseng

 

Abstract:

Big datasets have been keys to deep learning and the neural network approach applied to them in the past few years. However, one never has the luxury in medicinal chemistry compared to image processing field where large accessible data were readily available. The smaller datasets are often due to the lack of published experimental results which might be affected by including complex experimental design, expensive experimentation, or simply limitations in techniques. Also, the nature of medicinal chemistry chasing after more active compounds make almost published data unbalanced—that is having few positive data with mostly negative data. It would be invaluable to be able to train a model with unbalanced small dataset in medicinal chemistry for drug development in particular. In this work, we proposed a training strategy for unbalanced small datasets. The strategy includes selecting the sampling ratio, core deep learning methods, fingerprint selection, and descriptor merge of fingerprint and automatic feature extraction by deep learning. We chose the Ames test for mutagenicity as the example in this study due to its available information for validation study; and also the entire dataset could be divided in segments to simulate unbalanced small datasets for training and discussion. Overall, the up-sampling method is able to rebalance the data distribution in different categories and demonstrates better performance in both convergence speed and balanced accuracy.

 

References:

1. David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan ́ Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (NIPS), pp. 2224–2232, 2015.

 

Download PDF file: 

 

 

 

 

Recommended News

AWS

08, September, 2020

VIRTUALMAN joins the latest cohort of AWS Joint Innovation Center

Read article

Research

21, August, 2020

A new explainable graph convolution network based on discrete method: using water solubility as an example

Read article

Upcoming events

14 - 18, September, 2020

Virtual Meeting

TechCrunch Disrupt 2020

View event

08, September, 2020

TPRMA, Taiwan

Seminar - Using AI to accelerates the drug development of small molecular drug

View event

17 - 20, August, 2020

Virtual Meeting & Expo

ACS Fall 2020 Moving Chemistry From Bench to Market

View event

08 - 11, June, 2020

San Diego Convention Center, California

BIO International Convention

View event